NAVAL  POSTGRADUATE  SCHOOL 
MONTEREY,  CALIFORNIA 


DIPHONE-BASED  SPEECH  RECOGNITION  USING 
NEURAL  NETWORKS 


Mark  E.  Cantrell 


June,  1996 


Coadvisors: 


Dan  C.  Boger 
Robert  B.  McGhee 


Approved  for  public  release;  distribution  is  unlimited 


19960912  027 


REPORT  DOCUMENTATION  PAGE 


Form  Approved  OMB  No.  0704-0188 


blic  reporting  burden  for  this  collection  of  information  is  estimated  to  average  1  hour  per  response,  including  the  time  for  reviewing  instruction,  searching  existing 
lata  sources,  gathering  and  maintaining  the  data  needed,  and  completing  and  reviewing  the  collection  of  information.  Send  comments  regarding  this  burden  estimate  oi 
y  other  aspect  of  this  collection  of  information,  including  suggestions  for  reducing  this  burden,  to  Washington  Headquarters  Services,  Directorate  for  Information 
aerations  and  Reports,  1215  Jefferson  Davis  Highway,  Suite  1204,  Arlington,  VA  22202-4302,  and  to  the  Office  of  Management  and  Budget,  Paperwork  Reduction 
oject  (0704-0188)  Washington  DC  20503. 


LAGENCY  USE  ONLY  {Leave  blank) 


2.  REPORT  DATE 
JUNE  1996 


3.REPORT  TYPE  AND  DATES  COVERED 

Master’s  Thesis 


klTTLE  AND  SUBTITLE 

DIPHONE-BASED  SPEECH  RECOGNITION  USING  NEURAL 
NETWORKS 


). 

^UTHOR(S)  Cantrell,  Mark  E. 


'.PERFORMING  ORGANIZATION  NAME(S)  AND  ADDRESS(ES) 

Naval  Postgraduate  School 
Monterey  CA  93943-5000 


.SPONSORING/MONITORING  AGENCY  NAME(S)  AND  ADDRESS(ES) 


5.FUNDING  NUMBERS 


8.PERFORMING  ORGANIZATION 
REPORT  NUMBER 


lO.SPONSORING/MONITORING 
AGENCY  REPORT  NUMBER 


11.SUPPLEMENTARY  NOTES 

The  views  expressed  in  this  thesis  are  those  of  the  author  and  do  not  reflect  the  official  policy  or  position  of 
the  Department  of  Defense  or  the  U.S.  Government. 


12a.  DISTRIBUnON/AVAILABILITY  STATEMENT 


pproved  for  public  release;  distribution  is  unlimited. 


12b.DISTRIBUTION  CODE 


Speaker-independent  automatic  speech  recognition  (ASR)  is  a  problem  of  long-standing  interest  to  the  Department  of 
Defense.  Unfortunately,  existing  systems  are  still  too  limited  in  capability  for  many  military  purposes.  Most 
large-vocabulary  systems  use  phonemes  (individual  speech  sounds,  including  vowels  and  consonants)  as  recognition  units. 
This  research  explores  the  use  of  diphones  (pairings  of  phonemes)  as  recognition  units.  Diphones  are  acoustically  easier  to 
recognize  because  coarticulation  effects  between  the  diphone's  phonemes  become  recognition  features,  rather  than 
confounding  variables  as  in  phoneme  recognition.  Also,  diphones  carry  more  information  than  phonemes,  giving  the 
lexical  analyzer  two  chances  to  detect  every  phoneme  in  the  word.  Research  results  confirm  these  theoretical  advantages. 
In  testing  with  4490  speech  samples  from  163  speakers,  70.2%  of  157  test  diphones  were  correctly  identified  by  one 
trained  neural  network.  In  the  same  tests,  the  correct  diphone  was  one  of  the  top  three  outputs  89.0%  of  the  time.  During 
word  recognition  tests,  the  correct  word  was  detected  85%  of  the  time  in  continuous  speech.  Of  those  detections,  the 
correct  diphone  was  ranked  first  41.6%  of  the  time  and  among  the  top  six  74%  of  the  time.  In  addition,  new  methods  of 
pitch-based  frequency  normalization  and  network  feedback-  based  time  alignment  are  introduced.  Both  of  these  techniques 
improved  recognition  accuracy  on  male  and  female  speech  samples  from  all  eight  dialect  regions  in  the  U.S.  In  one  test 
set,  frequency  normalization  reduced  errors  by  34%.  Similarly,  feedback-based  time  alignment  reduced  another  network's 
test  set  errors  from  32.8%  to  1 1.0%. 

14.SUBJECT  TERMS 

Automatic  speech  recognition,  diphone,  neural  network,  speaker  independent, 
continuous  speech 


17.SECURITY 
CLASSIFICATION  OF 
REPORT 

hclassified 


18.SECURITY 

CLASSIFICATION  OF  THIS 
PAGE 

Unclassified 


19.SECURITY 
CLASSIFICATION  OF 
ABSTRACT 

Unclassified 


16.PRICECODE 


20.LIMITATION  OF 
ABSTRACT 


NSN  7540-01-280-5500 


1 


Standard  Form  298  (Rev.  2-89) 

Prescribed  by  ANSI  Std.  239-18  298-102 


Approved  for  public  release;  distribution  is  unlimited. 
DIPHONE-BASED  SPEECH  RECOGNITION  USING  NEURAL  NETWORKS 


Mark  E.  Cantrell 

Major,  United  States  Marine  Corps 
B.S.,  Oregon  State  University,  1982 

Submitted  in  partial  fulfillment 
of  the  requirements  for  the  degrees  of 

MASTER  OF  SCIENCE  IN  SYSTEMS  TECHNOLOGY 

and 

MASTER  OF  SCIENCE  IN  COMPUTER  SCIENCE 

from  the 

NAVAL  POSTGRADUATE  SCHOOL 
June  1996 


Author: 


Approved  by: 


Ted  Lewis,  Chairman  Department  of  Computer  Science 


Robert  B.  McGhee,  Coadvisor 


111 


ABSTRACT 


Speaker-independent  automatic  speech  recognition  (ASR)  is  a  problem  of 
long-standing  interest  to  the  Department  of  Defense.  Unfortunately,  existing  systems  are 
still  too  limited  in  capability  for  many  military  purposes.  Most  large-vocabulary  systems 
use  phonemes  (individual  speech  sounds,  including  vowels  and  consonants)  as 
recognition  units.  This  research  explores  the  use  of  diphones  (pairings  of  phonemes)  as 
recognition  units.  Diphones  are  acoustically  easier  to  recognize  because  coarticulation 
effects  between  the  diphone's  phonemes  become  recognition  features,  rather  than 
confounding  variables  as  in  phoneme  recognition.  Also,  diphones  carry  more  information 
than  phonemes,  giving  the  lexical  analyzer  two  chances  to  detect  every  phoneme  in  the 
word.  Research  results  confirm  these  theoretical  advantages.  In  testing  with  4490  speech 
samples  from  163  speakers,  70.2%  of  157  test  diphones  were  correctly  identified  by  one 
trained  neural  network.  In  the  same  tests,  the  correct  diphone  was  one  of  the  top  three 
outputs  89.0%  of  the  time.  During  word  recognition  tests,  the  correct  word  was  detected 
85%  of  the  time  in  continuous  speech.  Of  those  detections,  the  correct  diphone  was 
ranked  first  41.6%  of  the  time  and  among  the  top  six  74%  of  the  time.  In  addition,  new 
methods  of  pitch-based  frequency  normalization  and  network  feedback-  based  time 
alignment  are  introduced.  Both  of  these  techniques  improved  recognition  accuracy  on 
male  and  female  speech  samples  from  all  eight  dialect  regions  in  the  U.S.  In  one  test  set, 
frequency  normalization  reduced  errors  by  34%.  Similarly,  feedback-based  time 
alignment  reduced  another  network's  test  set  errors  from  32.8%  to  1 1.0%. 
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EXECUTIVE  SUMMARY 


The  Department  of  Defense  (DoD)  has  long  recognized  the  potential  value  of  automatic 
speech  recognition.  In  fact,  the  DoD,  often  acting  through  the  Defense  Advanced  Research 
Projects  Agency  (DARPA),  has  funded  or  facilitated  much  of  the  most  significant  speech 
recognition  research  over  the  past  few  decades.  As  conunand,  control,  communications, 
computer,  and  intelligence  (C4I)  systems  continue  to  swell  in  importance,  speech  recognition  is 
likely  to  assume  an  increasingly  important  role.  Unfortunately,  existing  systems  are  still  too 
limited  in  capability  for  many  military  purposes. 

Most  large-vocabulary  systems  use  phonemes  (individual  speech  sounds,  including 
vowels  and  consonants)  as  recognition  units.  This  architecture  may  place  an  upper  limit  on  the 
performance  of  these  systems.  This  research  explores  the  use  of  diphones  (pairings  of 
phonemes)  as  recognition  units.  Diphones  are  acoustically  easier  to  recognize  because 
coarticulation  effects  between  the  diphone's  phonemes  become  recognition  features,  rather  than 
confounding  variables  as  in  phoneme  recognition.  Also,  diphones  carry  more  information  than 
phonemes,  giving  higher  levels  in  the  recognition  system  two  chances  to  detect  every  phoneme  in 
the  word.  Research  results  confirm  these  theoretical  advantages. 

In  testing  with  4490  speech  samples  from  163  speakers,  70.2%  of  157  test  diphones  were 
correctly  identified  by  one  trained  neural  network.  In  the  same  tests,  the  correct  diphone  was  one 
of  the  top  three  outputs  89.0%  of  the  time.  During  word  recognition  tests,  the  correct  word  was 
detected  85%  of  the  time  in  continuous  speech.  Of  those  detections,  the  correct  diphone  was 
ranked  first  41.6%  of  the  time  and  among  the  top  six  74%  of  the  time.  In  addition,  new  methods 
of  pitch-based  frequency  normalization  and  network  feedback-  based  time  alignment  are 
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introduced.  Both  of  these  techniques  improved  recognition  accuracy  on  male  and  female  speech 
samples  from  all  eight  dialect  regions  in  the  U.S.  In  one  test  set,  frequency  normalization 
reduced  errors  by  34%.  Similarly,  feedback-based  time  alignment  reduced  another  network's  test 
set  errors  from  32.8%  to  1 1.0%. 

These  results  imply  that  diphones  may  indeed  be  suitable  as  recognition  units  for  large 
vocabulary,  speaker  independent,  continuous  speech  recognition  systems. 
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I.  INTRODUCTION 


A.  SPEECH  RECOGNITION 

Science  fiction  writers  seldom  portray  the  keyboard  as  the  computer  input  device  of  the 
future.  Instead,  futurists  have  long  predicted  that  humans  will  one  day  interact  with  their 
computers  through  speech.  Presumably,  that  interaction  will  be  two-way  (and  the  human  will  be 
in  charge).  If  so,  the  computer  will  be  performing  both  speech  synthesis  and  automatic  speech 
recognition  (ASR).  The  latter  process  is  the  subject  of  the  research  described  in  this  thesis. 
Specifically,  this  research  involves  evaluation  of  the  diphone  (defined  in  Chapter  11)  as  a 
recognition  unit  for  ASR  systems. 

For  the  purposes  of  this  research,  automatic  speech  recognition  is  the  process  of  translating 
the  human  speech  signal  from  its  acoustic  form  into  more  something  meaningful  to  a  computer, 
such  as  ASCn  text.  Current  ASR  systems  can  be  classified  with  respect  to  the  following 
characteristics: 

•  Speaker  independence  -  speaker  independent  systems  are  designed  to  recognize 
speech  for  a  broad  class  of  speakers  (e.g.,  adult  American  males),  while  speaker 
dependent  systems  must  be  trained  by  the  intended  user. 

•  Vocabulary  size  -  actual  numbers,  when  given,  depend  on  the  author.  But  a  small 
vocabulary  system  would  probably  recognize  from  a  few  to  a  few  hundred  words  while 
large  vocabulary  systems  should  recognize  tens  of  thousands  of  words.  A  medium 
vocabulary  would  fall  somewhere  in  between. 

•  Discrete  versus  continuous  speech  -  in  normal  (continuous)  human  speech  the  words 
tend  to  flow  together  with  little,  if  any,  silence  between.  Discrete-utterance 
recognition  systems  require  that  the  speaker  pause  between  words.  Naturally, 
continuous  speech  recognition  systems  would  be  easier  to  use.  But  they  are  harder  to 
build. 
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Ideally,  a  speech  recognition  system  would  be  speaker  independent,  have  a  large 
vocabulary,  and  be  capable  of  recognizing  continuous  speech.  Unfortunately,  for  reasons  to  be 
discussed  later,  current  systems  essentially  force  us  to  choose  just  one  of  those  three 
characteristics. 

B.  THE  IMPORTANCE  OF  SPEECH  RECOGNITION 

It  is  not  difficult  to  imagine  applications  for  ASR.  Some  of  the  more  obvious  include: 

•  Voice  dictation  and  transcription  -  doctors  and  dentists  could  record  their 
observations  while  examining  their  patients.  Fewer  details  would  be  forgotten  and  the 
results  would  be  legible. 

•  Remote  control  -  eveiything  from  alarm  clocks  to  robots  could  be  controlled  without 
hunting  for  a  hand-held  transmitter. 

•  Machine  operation  in  hostile  environments  -  where  gloves  would  otherwise  interfere. 

•  Assistance  for  the  handicapped  -  quadriplegics  could  operate  wheelchairs  and  the 
deaf  attend  lectures. 

•  Friendlier  user  interfaces  -  databases  and  computer  expert  systems  would  be  more 
accessible  to  less  technically-oriented  users. 

•  Public  information  systems  -  Voice-operated  telephone  switchboards  could  replace 
touch-tone  menu  systems  and  computers  could  "man"  information  booths  where  (or 
when)  humans  are  too  costly. 

•  Cockpit  systems  -  pilots  could  operate  avionics  and  weapons  without  removing  their 
hands  from  the  throttle  and  stick. 

•  Military  intelligence  systems  -  computers  could  use  word  spotting  algorithms  to  sift 
through  monitored  communications,  flagging  only  the  most  useful  for  further  study  by 
human  translators. 
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There  are,  of  course,  many  other  potential  uses  for  speech  recognition  technology.  Many 
applications  will  probably  only  come  to  light  when  more  capable  systems  are  finally  developed. 

C.  RELEVANCE  TO  THE  DEPARTMENT  OF  DEFENSE 

The  Department  of  Defense  (DoD)  has  long  recognized  the  potential  value  of  automatic 
speech  recognition.  In  fact,  the  DoD,  often  acting  through  the  Defense  Advanced  Research 
Projects  Agency  (DARPA),  has  funded  or  facilitated  much  of  the  most  significant  speech 
recognition  research  over  the  past  few  decades.  As  command,  control,  communications, 
computer,  and  intelUgence  (C4I)  systems  continue  to  swell  in  importance,  speech  recognition  is 
likely  to  assume  an  increasingly  important  role. 

D.  SCOPE  OF  RESEARCH 

Automatic  speech  recognition,  as  discussed  in  more  depth  later,  is  a  more  challenging  goal 
than  one  might  expect.  Therefore,  as  is  so  often  the  case  with  complex  problems,  ASR  is  better 
approached  as  a  series  of  smaller  and  more  manageable  subproblems.  Specifically,  existing 
systems  typically  operate  at  acoustic,  lexical,  syntactic,  and  (to  a  lesser  degree)  semantic  and 
pragmatic  levels  .  Acoustic  analysis  consists  of  translating  the  acoustic  waveform  into  a  series  of 
speech  symbols,  such  as  phonemes  (individual  speech  sounds,  discussed  in  more  detail  later). 
Lexical  analysis  consists  of  translating  the  speech  symbols  into  word  hypotheses.  Syntactic 
analysis  uses  the  rules  of  sentence  structure  to  choose  from  among  the  words  hypothesized  by  the 
lexical  analyzer.  The  semantic  level  uses  knowledge  about  the  domain  of  discourse  to  refine  the 
sentence  hypotheses  provided  by  lower  levels.  Finally,  pragmatic  analysis  involves  using  world 
knowledge  to  achieve  what  can  best  be  called  "understanding"  of  the  utterance.  The  last  two 
levels,  semantic  and  pragmatic,  clearly  require  what  can  best  be  termed  artificial  intelligence. 
Thus,  except  for  rudimentary  forms  (usually  domain  specific)  of  semantic  analysis,  satisfactory 
implementations  of  the  last  two  levels  are  likely  to  remain  distant  goals.  [Ref.  1] 

This  research  primarily  involves  ASR  at  the  acoustic  and  lexical  levels,  although  the  other 
levels  are  considered  and  receive  some  mention.  The  following  specific  research  questions  are 
addressed: 
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•  Is  the  diphone  a  suitable  choice  as  the  recognition  unit  (speech  symbol)  for  ASR 
systems? 

•  Can  artificial  neural  networks  recognize  diphones  in  the  acoustic  signal? 

•  What  can  be  done  to  improve  recognition  accuracy? 

•  Will  accuracy  in  identifying  the  selected  recognition  unit  translate  into  high  word 
recognition  accuracy? 

E.  ORGANIZATION  OF  THESIS 

Following  this  introductory  chapter,  background  information  is  provided  on  speech 
recognition  and  neural  networks  in  Chapter  II.  The  research  methodology  used,  and  the  rationale 
behind  it,  is  covered  in  Chapter  HI.  The  results  are  summarized  in  Chapter  IV  and  conclusions  in 
Chapter  V.  Finally,  Chapter  VI  offers  recommendations  for  further  research. 
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II.  BACKGROUND 


A.  INTRODUCTION 

One  cannot  hope  to  build  a  working  speech  recognition  system  without  knowing  something 
about  speech.  Nor  can  one  hope  to  use  neural  networks  effectively  without  knowing  something 
about  how  they  work.  Therefore,  this  chapter  will  provide  a  brief  overview  of  human  speech  and 
the  challenges  it  presents.  Then,  the  fundamentals  of  back-propagation  neural  networks  will  be 
introduced.  The  research  methodology  and  research  results,  presented  in  later  chapters,  should 
make  more  sense  in  the  light  of  this  information.  ' 

B.  SPEECH  RECOGNITION 

1.  Production  Mechanisms 

Figure  2.1  is  a  simplified  block  diagram  of  the  human  speech  production  system  [after  Ref. 
1].  Voiced  speech  is  produced  by  forcing  air  from  the  lungs  through  the  vocal  folds  (or  cords). 
The  air  causes  the  vocal  folds  to  vibrate  and  produce  sounds  dominated  by  a  fundamental 
frequency,  perceived  as  the  speaker's  pitch.  The  sound  of  the  vocal  fold  vibrations  must  then 
pass  through  several  cavities  which  serve  as  resonance  chambers,  reinforcing  some  frequencies 
while  suppressing  others.  Finally,  the  air  and  the  sound  must  pass  several  speech-related  organs, 
known  as  articulators,  which  further  modify  the  speech  waveform.  Voicing,  also  known  as 
phonation,  produces  all  of  the  vowels  and  is  involved  directly  in  some  of  the  consonants. 
Semivowels  {liquids  such  as  the  "w"  in  "wet"  and  glides  such  as  the  "y"  in  "yam")  are  essentially 
formed  by  transitioning  between  vowel-like  sounds. 

Consonants,  some  voiced  and  some  unvoiced,  are  also  produced  by  forcing  air  past  the 
articulators  and  resonance  chambers  of  the  vocal  tract.  But  the  vocal  folds  remain  silent  in  the 
case  of  unvoiced  consonants.  For  example,  unvoiced /ricarives  (such  as  the  "s"  in  "lease")  are 
produced  by  forcing  air  through  some  constriction  in  the  vocal  tract  other  than  the  vocal  folds 
(such  as  a  constriction  formed  between  the  tongue  and  teeth).  Plosives,  also  known  as  stops,  are 
produced  by  building  air  pressure  behind  a  constriction  and  then  suddenly  releasing  the  pressure 
to  produce  sound.  The  "t"  and  "p"  in  "stop"  are  both  plosives.  The  nasals  (such  as  the  "n"  in 
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"noise")  are  produced  through  a  combination  of  phonation  and  opening  of  the  velum  so  that  the 
sound  passes  through  both  oral  and  nasal  cavities.  Finally,  affricates  ("ch"  in  "change"  and  "j"  in 
"jam")  are  formed  by  transitions  between  stops  and  fricatives. 


Figure  2.1.  A  Block  Diagram  of  the  Speech  Production  System  [after  Ref.  1]. 


The  classification  of  speech  sounds  by  their  method  of  production  is  useful  because  the 
speech  signal  contains  acoustic  clues  as  to  the  method  of  a  sound's  production.  Those  clues  may, 
in  turn,  be  used  to  determine  what  words  were  spoken. 

Unfortunately,  conventional  alphabets  are  unsuitable  for  unambiguously  referring  to 
particular  speech  sounds.  So  phoneticians  have  defined  a  set  of  atomic  speech  units  known  as 
phonemes*,  as  well  as  several  phonetic  alphabets  for  referring  to  those  phonemes.  The 
International  Phonetic  Alphabet  (IP A),  developed  by  a  group  of  European  phoneticians  in  1888, 
has  enough  symbols  to  describe  all  of  the  phonemes  in  all  of  the  world's  languages.  But  the  IPA 

*  Phoneticians  actually  distinguish  between  the  phoneme,  a  theoretical  sound  unit,  and 
phones,  the  sounds  actually  produced  by  a  speaker.  Associated  with  each  phoneme  is  a  set  of 
phones,  called  allophones,  which  constitute  acoustics  variations  of  that  phoneme.  The  term 
phoneme  is  often  loosely  used,  in  this  thesis  and  elsewhere,  to  refer  to  all  three  speech  units. 
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is  most  suited  to  hand  transcription,  since  it  is  not  practical  for  use  with  conventional  keyboards 
and  printers.  In  the  United  States,  the  ARP  Abet  (developed  under  the  auspices  of  ARP  A)  is  more 
widely  used.  The  version  of  the  uppercase  ARP  Abet  used  in  the  TMIT  lexicon  (described  later) 
is  reproduced  in  Table  2.1.  The  symbols  listed  in  Table  2.1  have  been  used  (in  bold  upper  case 
letters)  to  refer  to  specific  phonemes  in  the  remainder  of  this  thesis. 


SYMBOL 

EXAMPLE 

■ 

EXAMPLE 

Stops: 

B 

Bee 

■ 

Glides: 

R 

Ray 

D 

Day 

Y 

Yacht 

G 

Gay 

Africates: 

JH 

Joke 

P 

Pea 

CH 

CHoke 

T 

Tea 

Vowels: 

lY 

bEEt 

K 

Key 

IH 

bit 

Fricatives: 

S 

Sea 

EH 

bEt 

SH 

She 

EY 

bAlt 

Z 

Zone 

AE 

bAt 

ZH 

aZure 

AA 

bottom 

F 

Fin 

AW 

about 

TH 

THin 

AY 

bite 

V 

Van 

AH 

cr 

C 

DH 

THen 

AO 

bOUGHt 

M 

Mom 

OY 

bOY 

N 

Noon 

OW 

bOAt 

NG 

siNG 

UH 

bOOk 

EM 

bottOM 

UW 

boot 

EN 

buttON 

ER 

bIRd 

ENG 

washINGton 

AX 

about 

Semivowel: 

EL 

bottLE 

IX 

debit 

Liquids: 

L 

Lay 

AXR 

butter 

W 

Way 

Table  2.1. 

The  Timit  Lexicon's  Phonetic  Alphabet  [after  Ref.  2]. 

Before  leaving  the  topic  of  speech  production,  it  is  useful  to  look  at  speech  production 
from  the  communications  system  perspective  (as  suggested  by  Parsons  [Ref.  3]).  Specifically, 


the  lungs  and  vocal  folds  provide  excitation  (the  carrier)  and  the  speech  articulators  provide 
modulation.  As  with  communications  systems,  the  information  content  is  in  the  modulation 
produced  by  the  articulators,  not  the  excitation  produced  by  the  vocal  folds. 

2.  Characteristics  of  the  Speech  Signal 

The  vocal  fold  vibrations  mentioned  earlier  do  not  produce  anything  like  a  clean  sinusoidal 
waveform.  Instead,  the  fundamental  frequency  is  accompanied  by  many  harmonics.  Articulator 
effects  and  time-varying  changes  in  the  size  and  shape  of  resonance  chambers  further  complicate 
the  signal.  Figures  2.2  and  2.3,  showing  just  a  few  seconds  of  speech,  illustrate  how  enormously 
complex  the  speech  waveform  is.  The  time-varying  nature  of  the  signal's  amplitude  is  obvious 
from  the  speech  "envelope"  in  Figure  2.2.  But  Figure  2.3  shows  that  the  frequency  varies  widely 
with  time  as  well. 


"GENTLY  SUBDUED"  SPOKEN  BY  AN  ADULT  MALE 


3000 

2000 


Time  (sec) 


Figure  2.2.  An  Example  of  Digitized  Speech 
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A  vowel  (IX)  Followed  by  a  Consonant  (S) 

1500 

1000 

500 

CD 

■o 

t  0 

E 

< 

-500 

-1000 

-1500 

1 .46  1 .48  1 .5  1 .52  1 .54  1 .56  1 .58  1 .6  1 .62  1 .64  1 .66 

Time  (seconds) 

Figure  2.3.  Frequency  Variations  with  Time  (the  IX  S  sound,  as  in  "aegIS") 

Unfortunately,  the  time-domain  view  of  speech  (shown  in  Figures  2.2  and  2.3)  is  of  limited 
use  in  speech  recognition.  For  example,  it  may  appear  at  first  glance  that  four  words  were 
recorded  in  Figure  2.2.  But  the  waveform  includes  only  two  words.  And  the  word  break  is  at 
1.53  seconds,  rather  than  at  1.75  seconds,  as  one  might  expect.  A  stop  consonant  (B)  resulted  in 
a  longer  silence  (1.69  to  1.81  seconds)  than  is  found  at  the  actual  junction  between  words. 

Since  the  time  domain  is  so  troublesome,  it  is  natural  to  look  to  the  frequency  domain.  The 
invention  of  the  spectrograph  at  Bell  Laboratories  in  1941  first  made  this  practical  [Ref.  3]. 
Today,  Pulse  Code  Modulation  (another  Bell  Labs  invention)  and  Fast  Fourier  Transforms 
(FFTs)  allow  digital  signal  processors  and  computers  to,  in  real  time,  determine  the  frequency 
components  present  in  a  speech  waveform.  Figure  2.4,  produced  using  MATLAB's  FFT 
function,  shows  the  spectral  characteristics  of  20  milliseconds  of  voiced  human  speech  (the 
vowel  lY,  as  in  bEEt,  uttered  by  an  adult  male).  Note  the  high  frequency  combing  produced  by 
the  vocal  folds,  and  the  three  prominent  peaks  in  the  spectral  envelope.  The  largest  peak,  and  the 
one  with  the  lowest  frequency,  is  at  300  Hz  in  Figure  2.4.  This  first  peak  is  known  as  the  first 
formant  (or  FI).  Its  position,  when  present,  is  determined  by  the  degree  of  oral  or  pharyngeal 
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constriction  of  the  vocal  tract.  The  second  formant  (F2)  is  at  about  2200  Hz  in  Figure  2.4.  When 
it  is  present,  the  second  formant's  position  is  determined  by  the  degree  of  constriction  produced 
by  the  front  or  back  of  the  tongue.  The  position  of  the  third  formant  (at  3000  Hz  in  the  same 
figure,  a  partial  blend  with  F2),  when  present,  is  largely  determined  by  vocal  tract  length.  The 
frequencies  of  all  three  formants  are  inversely  proportional  to  vocal  tract  length  and  any 
lip-rounding  during  articulation.  [Ref.  1] 


Figure  2.4.  Frequency  Components  of  a  Vowel  (lY  as  in  bEEt)  Spoken  by  a  Male 

Theoretically,  the  speech  signal  contains  an  infinite  number  of  formants,  just  as  it  contains 
an  infinite  number  of  harmonics.  But  the  higher  formants  become  increasingly  weak  and  the  first 
three  formants  (particularly  the  first  two)  carry  the  bulk  of  speech  information.  Similarly,  speech 
can  include  frequency  components  throughout  the  range  of  human  hearing  (about  15-20,000  Hz 
[Ref.  4]).  But  most  of  the  information  is  at  the  lower  frequencies.  This  is  evidenced  by  our 
ability  to  recognize  speech  over  telephone  systems  that  have  been  band-limited  to  between  300 
and  3000  Hz. 
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Because  articulator  movements  affect  formant  frequencies,  spectral  information  provides 
important  speech  recognition  clues,  at  least  for  vowels.  In  fact,  the  frequencies  and  intensities  of 
the  first  three  formants  can  be  used  to  identify  vowels.  Figure  2.5  (after  Ref.  1)  shows  the 
average  formant  locations  for  vowels  in  American  English. 


Figure  2.5.  Average  Formant  Frequencies  for  Vowels  in  American  English  [after  Ref.  1]. 


Unfortunately,  the  unvoiced  consonants  are  more  difficult  to  identify  from  their  spectral 
characteristics.  Figure  2.6,  a  spectrogram  of  the  S  consonant,  illustrates  this  fact.  The  spectral 
envelope  lacks  the  formant  peaks  characteristic  of  voiced  sounds.  Instead,  the  spectrum  is 
dominated  by  higher  frequencies  (a  noise-like  hissing  in  the  case  of  an  S).  In  fact,  the  best  clues 
as  to  the  identity  of  a  consonant  are  often  found  in  the  formant  transitions  of  neighboring  vowels 
[Ref.  3,5].  To  a  lesser  extent,  the  speech  envelope  in  the  time  domain  is  also  of  some  use  in 
consonant  identification.  For  example,  the  stop  gap  in  Figure  2.2  is  a  clue  that  a  plosive  follows. 
These  facts  will  become  important  in  later  discussions  of  the  recognition  unit  and  feature  vectors. 
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Semi-Logarithmic  Plot  of  Speech  Spectrum 


Figure  2.6.  Frequency  Spectrum  of  a  Consonant  (S)  Spoken  by  a  Male 

As  further  evidence  of  the  value  of  frequency  domain  analysis,  it  should  be  noted  that  at 
least  one  researcher,  Victor  Zue,  learned  to  read  spectrograms  with  greater  than  90  percent 
accuracy  [Ref.  3].  Furthermore,  he  was  able  to  teach  others  to  do  the  same. 

3.  Human  hearing 

When  setting  out  to  build  an  automatic  speech  recognition  system,  it  is  natural  to  ask  how 
humans  perform  the  same  task.  In  particular,  how  do  humans  perceive  pitch  or  frequency?  In  an 
effort  to  answer  this  question,  two  major  theories  were  developed  in  the  19th  century.  The  place 
theory,  advanced  by  Helmholz,  suggested  that  frequencies  are  encoded  by  the  place  of  activation 
of  hair  cell  receptors  in  the  cochlea  (the  spiral-shaped  tube  in  Figure  2.7).  Rutherford's  frequency 
theory,  on  the  other  hand,  suggested  that  frequency  is  encoded  by  the  frequency  of  discharge  of 
neurons  in  the  auditory  system.  The  issue  was  partially  settled  by  Georg  von  Bekesy,  who  was 
awarded  the  Nobel  Prize  in  1962  for  his  work.  Sound  produces  vibrations  of  the  tympanic 
membrane  (eardrum),  which  are  transferred  by  tiny  bones  (the  ossicles)  to  the  oval  window  at  one 
end  of  the  cochlea.  In  a  famous  series  of  experiments,  von  Bekesy  showed  that  vibrations  of  the 
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Figure  2.7.  Cross-Section  of  the  Human  Ear  and  Cochlea  [after  Ref.  21]. 
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oval  window  produce  standing  waves  in  the  fluid  filled  cochlea.  These  standing  waves  activate 
tiny  hair  cell  receptors  in  the  basilar  membrane  that  runs  the  length  of  the  cochlea.  The  sound's 
frequency  determines  the  point  of  maximum  displacement  of  the  basilar  membrane.  Higher 
frequencies,  for  example,  cause  standing  waves  with  maximum  amplitude  very  close  to  the  oval 
window.  Low  frequencies  activate  nerves  far  fi:om  the  oval  window.  The  place  theory  is  given 
further  credence  by  experiments  showing  that  individual  neurons  are  incapable  of  firing  more 
than  about  1000  times  per  second.  Yet  humans  are  capable  of  resolving  frequencies  up  to  about 
20,000  Hz.  So,  although  some  research  indicates  that  the  frequency  theory  is  valid  below  about 
1000  Hz,  the  place  theory  seems  to  best  explain  perception  of  higher  frequencies.  [Ref.  3,4] 

Although  we  now  know  quite  a  bit  about  the  function  of  the  ear  itself,  much  less  is  known 
about  the  higher  levels  in  the  auditory  system.  Unfortunately,  the  inner  ear  and  auditory 
processing  centers  of  the  brain  are  surrounded  by  the  skull,  brain  tissues,  and  other  critical 
structures  that  make  studies  on  living  patients  very  difficult.  However,  some  of  the  research  that 
has  been  done  implies  that  the  auditory  system  includes  feature  detectors  similar  to  those 
discovered  in  the  (more  accessible)  visual  cortex  of  the  brain.  The  visual  cortex  includes  simple 
feature  detectors,  such  as  edge  detectors  (neurons)  that  respond  to  edges  with  a  particular 
orientation  in  a  specific  portion  of  the  visual  field.  The  outputs  of  regions  of  these  neurons  feed 
into  complex  neurons.  The  outputs  of  complex  neurons  feed  into  hypercomplex  and  higher-order 
hypercomplex  neurons.  At  each  stage  of  processing,  the  neurons  are  responsible  for  recognizing 
more  complex  patterns  over  wider  regions  of  the  retina.  Researchers  at  Harvard  even  located  a 
neuron  in  a  monkey's  brain  that  responded  only  to  the  shape  of  a  hand.  [Ref.  4] 

If  we  assume  that  the  auditory  cortex  operates  like  the  visual  cortex,  it  should  contain 
feature  detectors.  Some  feature  detectors  might  respond  to  falling  pitch,  some  to  rising  pitch,  and 
others  to  temporal  combinations  of  both.  A  number  of  studies  support  these  assumptions.  It  is 
also  interesting  to  note  that  animals  with  damaged  auditory  cortexes  retain  the  ability  to 
recognize  pure  tones  but  lose  the  ability  to  recognize  more  complex  sounds  (such  as  melodies). 
[Ref.  6] 

It  is  encouraging  to  learn  of  the  ear's  ability  to  encode  speech  frequencies  at  such  a  low 
level  in  the  auditory  system.  This  ability  implies  that  the  frequency  domain  may  play  a  key  role 
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in  human  speech  decoding.  What  we  know  of  the  auditory  cortex  further  supports  this  theory.  In 
any  case,  the  frequency  domain  has  been  used  extensively  in  speech  research. 

Before  leaving  the  topic  of  human  hearing,  it  would  be  worthwhile  to  briefly  discuss  its 
selectivity.  As  mentioned  earlier,  humans  are  capable  of  detecting  tones  from  about  20  to  20,000 
Hz.  But  the  ear  is  most  sensitive  to  some  frequencies  than  others.  Research  has  shown  that  the 
ear  is  more  sensitive  to  frequencies  between  about  500  and  5000  Hz  [Ref.  3].  Perhaps  not 
coincidentally,  this  is  also  the  range  where  most  of  the  information  content  of  speech  resides. 

In  addition  to  nonuniform  sensitivity,  humans  exhibit  nonuniform  pitch  perception.  For 
example.  Middle  C  on  the  piano  is  261.6  Hz  while  the  C  one  octave  higher  is  at  twice  that 
frequency,  523.2  Hz.  Yet  the  second  C  does  not  sound  like  twice  the  pitch  [Ref.  4].  More  will 
be  said  of  this  in  Chapter  HI.  For  now  it  is  probably  enough  to  say  that  humans  can  detect  very 
slight  changes  in  frequency  below  about  1000  Hz.  But  absolute  sensitivity  to  a  change  in 
frequency  falls  off  approximately  logarithmically  at  higher  frequencies.  Again,  the  increased 
sensitivity  in  the  speech  band  is  interesting. 

4.  Why  Speech  Recognition  is  Difficult 

The  information  presented  so  far  might  lead  one  to  believe  that  speech  recognition  is  a 
relatively  straightforward  process.  It  would  seem  that  the  FFT  and  an  expanded  version  of 
Figure  2.5  could  turn  speech  recognition  into  a  fairly  simple  pattern  recognition  problem.  In  fact, 
this  illusion  of  simplicity  has  been  deceiving  researchers,  and  those  who  fund  research,  since  at 
least  the  1930s.  According  to  the  brief  history  presented  by  Parsons  [Ref.  3],  the  earliest  serious 
attempts  to  build  speech  recognition  hardware  were  part  of  an  effort  to  replace  human 
switchboard  operators  in  the  rapidly  expanding  phone  system.  When  this  effort  failed  to  produce 
a  system  that  could  recognize  digits  spoken  by  a  variety  of  people,  the  rotary  dialing  and 
switching  system  was  developed.  In  the  1950s,  many  incorrectly  predicted  that  the  digital 
computer  would  soon  make  automatic  speech  recognition  possible.  Since  then,  many  advances 
have  been  made.  But  we  are  still  a  long  way  from  duplicating  the  speech  recognition  capabilities 
of  humans. 

The  following  subsections  detail  some  of  the  problems  plaguing  automatic  speech 
recognition  systems. 
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a.  Coarticulation 


Coarticulation  is  one  of  the  most  difficult  challenges  facing  speech  systems.  There 
are  approximately  45  phonemes  in  American  English  (actual  numbers  vary  slightly  in  the 
literature).  Unfortunately,  the  acoustic  characteristics  of  a  given  phoneme  can  vary  widely 
depending  on  its  phonetic  context.  For  example,  the  second  D  and  the  Y  in  the  sentence,  "Did 
you  find  her?"  can  sound  more  like  a  JH  in  continuous  speech  [after  Ref.  1].  Coarticulation 
occurs  because  the  articulators  must  move  rapidly  between  the  target  positions  for  each  phoneme. 
In  fact,  they  often  never  reach  the  target  position  for  one  phoneme  before  they  start  transitioning 
to  the  next.  Thus,  speech  recognition  systems  must  cope  with  a  very  large  number  of  actual 
speech  sounds.  This  is  far  more  challenging  than  identifying  45  or  so  phonemes  in  isolation. 

b.  Segmentation 

Another  problem  in  speech  systems  is  in  locating  the  boundaries  between  words  and 
between  phonemes.  As  mentioned  earlier,  there  is  frequently  no  pause  whatsoever  between 
words.  And  coarticulation  makes  for  extremely  fuzzy  boundaries  between  phonemes.  So  a 
speech  recognition  system  must  recognize  patterns  of  varying  lengths  in  a  rapidly  changing 
acoustic  stream.  This  problem  partially  accounts  for  the  failure  of  some  phoneme  recognizors  to 
perform  nearly  as  well  on  words.  It  is  one  thing  to  identify  a  hand  segmented  slice  of  speech.  It 
is  quite  another  to  spot  the  same  phoneme  in  a  waveform  when  the  system  lacks  a  priori 
knowledge  of  where  to  start  and  stop  looking. 

c.  Speaker  Variations 

Obviously,  accents,  dialects,  and  speech  pathologies  will  present  problems  for  speech 
recognition  systems.  But  more  fundamental  differences  between  speakers  also  wreak  havoc  with 
recognition  systems.  As  mentioned  earlier,  formant  frequencies  tend  to  be  inversely  proportional 
to  vocal  tract  length.  Thus,  adult  males,  with  vocal  tracts  an  average  of  about  30  percent  longer 
than  women,  exhibit  formant  frequencies  that  are  lower  than  females  for  the  same  phoneme 
[Ref.  7].  These  individual  variations  cause  overlap  between  formant  positions  for  vowels  of 
different  classes.  Humans  apparently  adjust  rapidly  to  the  formant  frequencies  of  a  particular 
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speaker  [Ref.  7]  and  some  researchers  have  attempted  to  build  recognition  systems  that  do  the 
same  [Ref.  8]. 

In  addition  to  vocal  tract-related  differences,  recognition  systems  must  cope  with 
variations  in  speech  rate,  volume,  inflection,  and  pitch.  Even  separate  utterances  by  the  same 
speaker  will  vary  considerably  from  time  to  time.  For  example,  a  pilot  handling  an  in-flight 
emergency  might  speak  more  rapidly  and  at  a  higher  pitch  than  normal. 

d.  Confusability  and  Ambiguity 

The  easiest  way  to  introduce  confusability  and  ambiguity  is  with  a  few  examples 
[after  Ref.  7]: 

•  Homonyms  -  "red"  versus  "read"  or  "two"  versus  "to" 

•  Word  segmentation  -  "I  scream  for  ice  cream."  versus  "ice  cream  for  ice  cream"  or 
"gray  tape"  versus  "great  ape"  or  "light  housekeeper"  versus  "lighthouse  keeper" 

•  Syntactic  ambiguity  -  "The  boy  jumped  over  the  stream  with  the  fish."  (Did  the  boy 
have  a  fish  or  did  the  stream?) 

•  Semantic  ambiguity  -  "The  council  decided  to  subsidize  the  port."  (Is  this  an 
alcoholic  or  salt  water  port?)  or  "He  saw  that  gas  can  bum."  (Is  "can"  a  verb  or  a 
noun?  [Ref.  1]) 

A  certain  amount  of  world  knowledge  is  needed  to  make  sense  out  of  these  examples 
in  a  given  context.  Thus,  artificial  intelligence  techniques  must  be  used  to  attack  such  problems. 
However,  artificial  intelligence  techniques  will  be  more  likely  to  succeed  if  lower  levels  (acoustic 
and  lexical  analysis)  of  the  recognition  system  perform  well. 

e.  Noise 

Ultimately,  speech  recognition  systems  may  be  judged  by  their  ability  to  handle 
noise.  Humans  do  a  superb  job  of  understanding  speech  under  exceptionally  noisy  conditions. 
Even  interfering  speech,  the  worst  possible  form  of  noise,  is  easily  handled  by  humans.  This  is 
evidenced  by  the  well-known  cocktail  party  effect,  our  ability  to  isolate  and  listen  to  one 
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speaker's  voice  in  a  crowd.  [Ref.  3]  But  current  speech  recognition  systems  are  hard-pressed  to 
perform  well  even  on  clean  speech. 

5.  Existing  Commercial  Systems 

Existing  commercial  systems  typically  use  hidden  Markov  models  (HMMs)  at  the  heart  of 
their  recognition  engines.  The  HMM  dates  back  to  the  1950s,  when  statisticians  were  looking  for 
ways  to  characterize  random  processes  for  which  they  had  only  incomplete  observations.  They 
developed  the  HMM  to  model  these  problems  as  doubly  stochastic  processes.  In  an  HMM,  the 
observed  data  is  considered  to  be  the  result  of  passing  the  "true"  (hidden)  process  through  a 
"censor,"  resulting  in  a  second  observed  process.  Both  processes  are  characterized  using  only  the 
one  that  can  be  observed.  The  HMM  was  introduced  to  the  speech  recognition  field  in  the  1970's 
through  the  independent  work  of  Baker  at  Camegie-Mellon  and  Jelinek  and  his  colleagues  at 
IBM.  [Ref.  1] 

Hidden  Markov  Model  speech  recognition  systems  operate  by  using  a  training  process  to 
build  a  statistical  model  for  each  word  (or  other  utterance)  in  the  vocabulary.  Each  model  is 
imagined  to  be  a  finite  state  machine  capable  of  generating  observation  strings.  During 
recognition,  given  an  actual  observation  string,  the  system  determines  the  likelihood  that  each 
word's  HMM  produced  that  particular  observation  string.  The  word  with  the  highest  likelihood 
is  declared  to  be  the  word  recognized.  [Ref.  1] 

Dragon  Systems  and  IBM  currently  offer  two  of  the  most  well-known  commercial  speech 
recognition  systems.  The  following  specifications  were  taken  from  Internet  Home  Pages 
describing  their  respective  products  (as  of  May  1996): 

a.  Dragon  Systems  DragonDictate  2.0  for  Windows  (Power  Edition) 

•  60,000  word  primary  (in  RAM)  dictionary 

•  120,000  word  backup  (on  disk)  dictionary 

•  Up  to  64.000  word  user  dictionary 

•  Domain-specific  dictionaries  available 
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•  Discrete-speech  -  QuickTalk  feature  allows  user  to  shorten  the  pause  between  words. 
Dragon  Systems  calls  this  semi-connected  speech.  Also,  some  commands  and 
numbers  may  be  spoken  in  a  traly  continuous  manner. 

•  Speaker  Adaptable  -  Dragon  Systems  claims  reasonably  accurate  recognition  "out  of 
the  box."  Training  or  "enrollment"  is  optional.  The  system  automatically  adapts  to  the 
user  during  use  (over  time). 

•  16  MB  RAM  required,  20  MB  recommended  (8  MB  dedicated  to  DragonDictate). 

•  42  MB  hard  disk  for  system  (minimum)  plus  14  MB  per  user  (for  speaker-dependent 
data). 

•  DSP  card  not  required  but  recommended  for  computationally-intensive  applications 
(uses  ordinary  sound  card  otherwise). 

b.  IBM  VoiceType  Dictation  for  Windows 

•  22,000  word  general-purpose  dictionary 

•  Up  to  2.000  word  user  dictionary 

•  Domain-specific  dictionaries  available  (including  20,000  Radiology  words,  16,000 
Emergency  Medicine  words,  30,000  Journalism  words,  or  25,000  Legal  words) 

•  Discrete-speech  -  IBM  states  that  a  pause  between  words  of  as  little  as  1/10  of  a 
second  is  sufficient.  They  also  claim  that  users  can  achieve  dictation  rates  of  at  least 
70  to  100  words  per  minute 

•  Speaker  Adaptable  -  A  brief  "enrollment"  period  is  required. 

•  12  MB  RAM  required,  16  MB  recommended  (8  MB  dedicated  to  Dragon  Dictate). 

•  33  MB  hard  disk  for  system  (minimum)  plus  14  MB  per  user. 
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•  IBM  claims  that  accuracy  percentages  in  the  high  90s  are  not  uncommon.  (No 
accuracy  figures  were  offered  in  the  Dragon  Systems  home  page.) 

•  IBM  DSP  card  required. 

Not  surprisingly,  neither  home  page  offered  details  on  the  recognition  algorithms  used. 
But  the  product  descriptions  used  terms  implying  that  HMMs  are  used  for  language  and  word 
models  in  both  systems.  To  put  some  of  the  product  specifications  in  perspective,  consider  the 
following  facts: 

•  The  most  commonly  used  words  make  up  about  75%  of  typical  text  samples.  A 
vocabulary  of  5000  words  is  sufficient  for  about  95%  of  typical  text.  The  remaining 
5%  of  words  will  depend  on  the  user  and  the  domain  of  discourse.  [Ref.  7] 

•  The  word  search  space  increases  exponentially  with  vocabulary  size.  So  both 
recognition  time  and  error  frequency  are  likely  to  increase  with  vocabulary  size. 

•  At  least  one  study  indicates  that  users  find  discrete  word  recognition  systems  quite 
acceptable  for  some  office  tasks  [Ref.  7].  This  is  not  surprising,  in  that  the  alternative 
to  careful  enunciation  may  be  "hunt  and  peck"  typing  for  many.  However,  the  general 
public  is  much  less  cooperative  with  someone  else's  automatic  recognition  system  than 
an  office  worker  would  be  with  his  own  system.  This  fact  has  plagued  telephone 
industry  researchers  since  the  earliest  days  of  speech  recognition  research. 

•  Both  systems  place  relatively  hefty  demands  on  the  personal  computer. 


Much  more  could  be  said  about  the  state  of  the  art,  and  speech  recognition  in  general.  But 
further  details  will  be  defeired  until  the  discussion  of  related  issues  in  subsequent  chapters.  For 
example,  previous  research  on  formant  normalization  will  be  included  in  Chapter  IE's  discussion 
of  that  topic. 
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C.  BACK-PROPAGATION  NEURAL  NETWORKS 


Having  covered  some  of  the  fundamentals  of  speech  recognition,  it  is  time  to  describe  the 
back-propagation  neural  network,  the  primary  pattern  recognition  tool  used  in  this  research. 

Artificial  neural  networks  are  an  attempt  to  emulate  the  massively  parallel  processing 
approach  used  by  the  human  brain.  Unfortunately,  we  still  know  far  too  little  about  biological 
neural  networks  themselves.  So  existing  artificial  neural  networks  (ANNs)  are  still  quite 
primitive.  Nevertheless,  they  have  proven  to  be  extremely  useful  in  a  wide  variety  of  fields.  And 
neural  network  research  is  continuing  on  many  fronts. 

No  attempt  will  be  made  to  provide  a  comprehensive  introduction  to  neural  networks.  The 
subject  has  inspired  hundreds  of  books  and  thousands  of  research  papers  over  the  last  50  years. 
Instead,  only  back-propagation  neural  networks  will  be  described.  This  architecture  has  been 
said  to  account  for  about  80%  to  90%  percent  of  practical  ANN  applications  to  date  [Ref.  9]. 

1.  MATLAB  Neural  Network  Toolbox 

The  bulk  of  this  research  was  conducted  using  the  MATLAB  Neural  Network  Toolbox,  a 
product  of  The  Mathworks,  Incorporated.  MATLAB  itself  offers  a  rich  collection  of  signal 
processing,  numeric  computation,  and  visualization  tools.  The  Neural  Network  Toolbox 
includes  a  variety  of  functions  specific  to  neural  network  applications.  MATLAB  also  includes  a 
powerful  script  language  for  developing  custom  algorithms  and  extending  the  toolbox. 

In  addition  to  the  Neural  Network  Toolbox,  the  MATLAB  code  in  Appendix  A  makes  use 
of  several  functions  from  the  Signal  Processing  and  Communications  Toolbox.  The  latter 
toolbox  was  developed  at  the  Naval  Postgraduate  School  to  support  coursework  in  speech  and 
signal  processing  [Ref.  10]. 

In  the  sections  and  chapters  that  follow,  equations  are  often  expressed  in  terms  of 
MATLAB  function  calls.  This  should  reduce  ambiguity  and  simplify  cross  referencing  of 
equations  with  code  in  Appendix  A.  Details  on  MATLAB  script  language  syntax  and  functions 
are  found  in  user's  guides  for  the  Neural  Network  Toolbox  and  MATLAB  itself  [Ref.  9].  The 
Neural  Network  Toolbox  and  its  User's  Guide  also  include  clear  and  concise  tutorials  on  neural 
networks,  as  well  as  demonstration  scripts  to  illustrate  concepts  and  facilitate  experimentation. 
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2.  The  Biological  Neuron 

A  typical  biological  neuron  is  pictured  in  Figure  2.8.  A  number  of  branch-like  projections 
called  dendrites  radiate  from  the  cell  body  (soma).  Neurons  receive  signals  from  other  neurons 
through  connections,  called  synapses,  at  the  dendrites  or  the  cell  body  itself.  Some  synapses  are 
excitory  and  tend  to  raise  the  activation  level  of  the  neuron.  Other  synapses  are  inhibitory  and 
lower  the  activation  level.  The  incoming  signals  arrive,  typically  as  asynchronous  pulse  trains, 
and  contribute  to  what  can  best  be  called  a  decaying  weighted  average.  When  the  neuron's 
activation  level  reaches  some  threshold,  it  fires,  sending  a  signal  to  other  neurons  via  its  axon  and 
axon  branches.  A  neuron  can  directly  receive  signals  (via  synapses)  from  up  to  10,000  or  so 
other  neurons.  Likewise,  it  can  send  signals  to  10,000  or  so  other  neurons  (including  connections 
to  its  own  dendrites  or  soma).  [Ref.  11] 


The  human  brain  contains  on  the  order  of  10  billion  total  neurons  (approximately  the 
number  of  stars  estimated  to  make  up  the  Milky  Way  galaxy).  These  neurons  form  a  network 
with  an  estimated  60  trillion  interconnections  between  neurons.  (By  contrast,  an  artificial  neural 
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network  is  considered  large  if  it  contains  a  few  thousand  neurons  and  a  few  million  connections.) 
Many  of  these  connections  form  during  the  first  two  years  of  life,  at  a  rate  of  up  to  one  million 
synapses  per  second.  The  mechanisms  behind  this  process,  and  its  exact  relationship  to  learning, 
are  largely  unknown.  [Ref.  11] 

The  neurons  in  the  cerebral  cortex  (the  portion  apparently  responsible  for  our  higher  mental 
processes)  are  arranged  into  a  wrinkled  sheet  two  to  three  millimeters  thick  and  about  2,200 
square  centimeters  in  area  (about  the  area  of  two  computer  keyboards).  Based  on  reaction  time 
studies  and  the  speed  at  which  biological  neurons  switch  (a  few  milliseconds),  it  has  been 
estimated  that  human  perceptual  decisions  cannot  involve  more  than  100  or  so  serial  steps 
[Ref  11]. 

3.  The  Artificial  Neuron 

Back-propagation  neural  networks  are  constructed  of  what  are  conunonly  called 
McCulloch  and  Pitts  (MP)  neurons,  named  in  honor  of  two  pioneering  researchers  at  the 
University  of  Chicago  in  the  1940s.  The  MP  neuron,  pictured  in  Figure  2.9,  is  a  greatly 
simplified  abstraction  of  the  biological  neuron.  The  inputs  are  multiplied  by  their  respective 
weights  and  then  added.  The  weighted  sum  is  then  passed  through  a  transfer  function  (also 
called  activation  function)  to  determine  if  the  neuron  will  "fire."  In  practice,  a  bias  input  is 
usually  added,  for  reasons  to  be  explained  later.  The  bias  input  is  always  one.  But  the  weight 
associated  with  the  bias  is  adjusted  during  training,  as  are  the  other  weights. 


Inputs  Weights 


Figure  2.9.  A  McCulloch  and  Pitts  Neuron  [after  Ref.  11]. 
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Neural  network  training  and  interconnection  strategies  are  introduced  later.  First,  it  is 
useful  to  see  how  a  single  MP  neuron  can  be  applied  to  simple  problems. 

4.  Artificial  Neural  Networks  as  Classifiers 

Figure  2.10  illustrates  a  simple  classification  problem.  Each  of  the  two  inputs  can  be  one 
or  zero.  There  are  four  possible  pairs  of  inputs,  commonly  known  as  input  vectors.  An  input 
vector  consisting  of  two  ones  should  be  classified  into  output  category  one  (plus  symbols  in  the 
figure).  All  other  input  vectors  should  result  in  an  output  of  zero  (circles  in  the  figure).  Clearly, 
this  is  a  logical  AND  operation.  So  we  can  certainly  solve  it  using  conventional  hardware  or 
programming  techniques.  But  it  is  useful  for  illustrating  how  neurons  operate. 


Figure  2.10.  A  simple  classification  problem  (logical  AND).  The  two  inputs  are  plotted  along  the  axes. 
Correct  output  classes  are  indicated  with  “+"  (true)  and  "o"  (false)  symbols. 
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It  is  immediately  obvious  from  Figure  2.10  that  the  input  vectors  can  be  properly  separated 
into  the  correct  output  classes  by  drawing  a  diagonal  line  through  the  input  space.  Recall  that  the 
general  form,  for  the  equation  of  a  line  is: 

Ax  +  By  +  C  =  0  (2.1) 

where  A,  B,  and  C  are  constants.  This  can  easily  be  translated  into  neural  network  form  by 
renaming  some  variables: 

+  XgWg  +  B  =  0  (2.2) 

where  x,  and  Xj  are  inputs,  W,  and  Wj  are  weights,  and  B  is  a  bias.  Thus,  a  MP  neuron  with  two 
inputs  can  be  used  to  represent  a  line  in  a  two-dimensional  input  space.  If  we  assume  a  threshold 
transfer  function,  an  input  vector  on  one  side  of  the  line  will  make  the  left  side  of  Equation  2.2 
negative  and  output  will  be  zero.  Input  vectors  falling  on  the  other  side  of  the  line  will  make  the 
left  side  of  the  equation  positive  and  the  output  one.  Inputs  falling  exactly  on  the  line  will  result 
in  a  sum  of  zero.  In  that  unlikely  case,  the  output  would  normally  also  be  zero.  Without  a  bias, 
all  classification  lines  would  necessarily  pass  through  the  origin.  Thus,  a  bias  is  usually  included 
to  allow  classification  lines  anywhere  in  the  input  space. 

At  this  point,  it  is  natural  to  wonder  how  the  weights  in  Equation  2.2  are  determined.  In 
practice,  they  are  usually  initialized  to  random  numbers  that  have  been  scaled  based  on  the  range 
of  input  values.  The  weights  are  then  adjusted  until  the  resulting  line  leads  to  the  correct 
classifications.  This  process  will  be  described  in  more  detail  later.  For  now,  it  is  sufficient  to 
observe  the  results.  Figure  2.1 1  shows  the  initial  (random)  classification  line  of  a  single  neuron. 
Figure  2.12  shows  the  final  line  defined  by  the  neuron's  weights  after  training 

If  a  third  input  is  added,  the  neuron's  sununation  formula  becomes: 

X^W,  +  X2W2  +  X3W3  +  B  =  0  (2.3) 

This,  of  course,  is  analogous  to  the  general  form  of  the  equation  for  a  plane.  Thus,  our 
input  space  becomes  three-dimensional  and  the  input  classes  are  separated  using  planes.  With 


25 
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four  or  more  inputs,  the  input  space  becomes  n-dimensional  hyperspace  and  the  input  classes  are 
separated  using  hyperplanes. 


Fig  2.12.  Classification  line  defined  by  the  AND  neuron’s  final  (trained)  weights. 
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5.  Linear  Separability  and  MultMayer  Networks 

The  McCulloch  and  Pitts  neuron,  after  a  brief  period  of  excitement  in  the  1940s,  received 
little  attention  until  the  1960s.  Then,  Rosenblatt's  perceptron  convergence  theorem  led  to  a 
flurry  of  research.  But  research,  and  funding,  largely  ended  in  the  early  1970s.  A  1969  book  by 
Minsky  and  Papert  is  usually  credited  (or  blamed)  for  the  dramatic  loss  of  interest  in  the 
perceptron  (usually  defined  to  be  a  simple  MP  neuron  with  a  threshold  transfer  function).  The 
most  important  point  made  by  Minsky  and  Papert  is  that  a  simple  perceptron  cannot  learn  to 
correctly  classify  input  vectors  unless  the  input  vectors  from  different  output  classes  are  linearly 
separable.  The  most  famous  example  of  a  non-linearly  separable  problem  is  the  exclusive-OR 
function.  Figure  2.13  depicts  the  possible  input  vectors  and  their  correct  classifications.  Note 
that  no  single  line  can  be  used  to  properly  divide  the  input  vectors  into  the  correct  output  classes. 
Some  researchers  suspected  that  multi-layer  networks  would  be  able  to  solve  this  problem.  But 
the  perceptron  training  method  would  not  allow  weight  corrections  to  propagate  through  multiple 
layers.  And  Minsky  expressed  doubt  that  multi-layer  perceptrons  would  be  useful,  even  if  they 
could  be  built.  [Ref.  12] 
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Figure  2.13.  A  non-linearly  separable  problem  (the  exclusive-OR  problem). 
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In  the  mid  1980s,  several  researchers*  more  or  less  simultaneously  proposed  solutions  to 
the  problem  of  training  multi-layer  neural  networks.  Their  back-propagation  learning  rule  allows 
weight  corrections  to  propagate  from  the  output  layer  back  through  one  or  more  hidden  layers. 
This  process  will  be  described  in  more  detail  in  the  next  section.  First,  it  is  necessary  to 
understand  how  MP  neurons  are  intercoimected  to  form  neural  networks.  Figure  2.14  shows  a 
neural  network  capable  of  solving  the  exclusive-or  problem.  The  network  consists  of  two  inputs, 
two  neurons  in  a  single  hidden  layer,  and  a  single  output  neuron.  Note  that  all  neurons  in  the 
first  hidden  layer  are  connected  to  all  of  the  inputs.  Also,  every  output  neuron  (only  one  in  this 
case)  receives  inputs  from  every  neuron  in  the  preceding  hidden  layer.  This  is  referred  to  as  a 
fully  connected  feed-forward  network.  Some  network  architectures  are  not  fully  connected  and 
some  involve  connections  between  neurons  of  the  same  layer  (or  from  a  neuron  to  itself).  But 
these  architectures  are  less  common  (at  least  in  artificial  neural  networks)  and  they  will  not  be 
discussed  further  here. 


Figure  2.14.  A  multi-layer  neural  network  capable  of  solving  the  excIusive-OR  problem. 


Figure  2.15  shows  the  two  lines  defined  by  the  two  hidden  layer  neurons  of  the  network  in 
Figure  2.14  (after  learning  to  solve  the  exclusive-OR  problem).  Figure  2.16  shows  the  outputs  of 
the  hidden  layer,  mapped  into  the  output  neuron's  input  space.  Note  that  this  new  input  space, 
though  still  two-dimensional,  is  not  the  same  as  the  original  input  space.  This  time,  one  axis  is 

*  Rumelhart,  Hinton,  &  Williams,  1986;  Parker,  1985;  LeCun,  1985;  and  Werbos,  whose 
doctoral  dissertation  of  1974  solved  the  problem  but  was  not  noticed  until  much  later. 
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the  first  hidden  node’s  output  and  the  other  hidden  node's  output  is  plotted  on  the  second  axis. 
The  output  classes  have  again  been  indicated  using  plus  symbols  or  circles.  Note  that  the  vectors 
are  now  linearly  separable.  The  line  in  Figure  2.16  was  plotted  using  the  actual  weights  of  a 
trained  network  (as  have  all  previous  classification  lines).  This  simple  example  illustrates  how 
multi-layer  neural  networks  solve  non-linearly  separable  problems.  They  use  hidden  neurons, 
often  called  feature  detectors,  to  map  the  input  vectors  into  a  new  input  space,  usually  of  different 
dimensionality.  Given  enough  hidden  neurons,  any  problem  becomes  linearly  separable.  This 
issue  will  be  addressed  in  more  detail  in  the  next  chapter. 


Figure  2.15.  Classification  lines  defined  by  hidden  layer  weights  for  exclusive-OR  problem. 


Figure  2.16.  Classification  line  of  output  neuron  plotted  in  the  output  space  of  hidden  layer. 
Note  that  the  hidden  layer  has  mapped  the  exclusive-OR  problem  into  a  new  two-dimensional 
space,  where  it  is  now  linearly  separable. 
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6.  Training  Methods 

Multi-layer  neural  networks  are  typically  simulated  in  conventional  computers  using 
matrices.  In  MATLAB's  Neural  Network  Toolbox,  a  weight  matrix  for  a  given  layer  consists  of 
an  m  by  n  matrix  where  m  represents  the  number  of  neurons  (layer  outputs)  and  n  is  the  number 
of  inputs.  Thus,  the  weight  given  to  input  i  by  neuron  j  is  found  in  matrix  position  (j,i). 

The  outputs  for  a  layer  are  computed  by  arranging  the  inputs  into  a  column  vector.  The 
input  vector  is  multiplied  by  the  weight  matrix  to  produce  an  output  vector.  The  transfer 
function  is  then  applied  to  each  element  of  the  output  vector  to  produce  the  final  output  vector. 
In  a  multi-layer  network,  this  output  vector  would  become  the  input  vector  for  the  next  layer. 
This  is  the  reason  for  the  term  "feed-forward  network."  In  matrix  notation,  the  process  is  quite 
simple: 


output  =  F(W*p  +  b) 


(2.4) 


where  F(x)  is  the  transfer  function  applied  to  each  element  of  matrix  x. 

For  example,  the  hidden  layer  outputs  for  the  exclusive-OR  network  described  earlier  are 
found  by: 


output  vector  =  F( 


-7.7068  7.2724  ' 

X 

’ll  ’ 

-h 

'  4.2454  ’ 

-8.6552  7.8783 

l2 

-3.8833 

(2.5) 


7.  The  Back-Propagation  Learning  Rule 

Neurons  with  a  threshold  transfer  function  are  unsuitable  for  multi-layer  networks  because 
it  is  impossible  to  determine  which  hidden  layer  neuron  weights  were  the  most  responsible  for 
incorrect  (or  correct)  outputs.  Thus,  it  is  impossible  to  apply  weight  changes  to  only  the  incorrect 
weights  because  those  weights  cannot  be  identified.  This  is  known  as  the  credit  assignment 
problem  [Ref.  13].  Back-propagation  solves  this  problem  by  using  a  differentiable  transfer 
function,  such  as  the  log-sigmoid  function: 

logsig(n)  =  1/(1 +e'")  (2.6) 
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With  a  differentiable  transfer  function,  the  network  can  properly  assign  credit  for  wrong 
answers,  and  reduce  more  heavily  the  input  weights  that  contributed  most  to  that  incorrect 
output.  Note  in  the  following  discussion  that  the  back-propagation  learning  rule  is  really  a  form 
of  gradient  descent.  The  elements  of  the  delta  vectors  (Equations  7  and  9)  serve  as  error  surface 
partial  derivatives.  Training  proceeds  as  follows  [Ref.  14]: 

•  An  input  vector  from  the  training  set  is  presented  to  the  first  hidden  layer's  inputs.  The 
results  feed  forward  through  each  layer  as  described  earlier. 

•  The  network's  actual  outputs  are  compared  to  the  desired  outputs.  The  output  layer's 
weight  corrections  are  based  on  the  difference,  an  error  vector.  For  output  neuron  j,  a 
delta  value  must  first  be  computed  using  the  error  vector  (in  parentheses),  the 
weighted  sum  of  the  neuron's  inputs,  and  the  transfer  function's  derivative: 

6j  =  (t  arg  etj  -  outputj )f'(S  Wili)  (2.7) 

where  f  is  the  derivative  of  the  transfer  function. 

•  The  output  neuron's  delta  value  is  then  used  to  determine  the  change  to  apply  to  each 
input  weight  Wj;  as  follows: 

AWji=T|6jIi  (2.8) 

where  T|  is  the  learning  rate,  typically  between  .25  and  .75,  and  Ij  is  the  input 
associated  with  the  weight  being  adjusted.  In  practice,  the  learning  rate  is  often  adjusted 
dynamically  as  training  proceeds.  This  research  was  conducted  using  the  adaptive  learning  rate 
algorithms  included  in  the  MATLAB  Neural  Network  Toolbox. 

•  The  hidden  layer  weight  adjustments  are  slightly  more  complicated.  To  compute  the 
delta  value  for  a  hidden  neuron,  the  delta  values  of  the  output  neurons  are  multiplied 
by  the  weights  of  their  connections  to  that  hidden  neuron.  The  hidden  neuron's  delta  is 
thus  the  weighted  sum  of  the  output  layer  deltas.  For  hidden  neuron  k,  the  delta  value 
is: 

8k=(i8iWki)f'(rwiii)  (2.9) 
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where  ^  is  the  weight  between  this  hidden  neuron  and  output  neuron  j  and  f  (SWjlj ) 
is  again  the  transfer  function’s  derivative  for  the  sum  of  the  hidden  neuron's  weighted 

inputs. 

•  Once  the  hidden  neuron's  delta  value  has  been  computed,  its  weights  are  adjusted  as 
before  (using  Equation  2.8). 

•  If  the  network  has  more  than  one  hidden  layer,  the  error  corrections  continue  to 
back-propagate  in  this  manner  until  all  layers  have  been  adjusted. 

•  Once  the  weights  of  all  layers  have  been  adjusted,  the  next  training  input  vector  is 
presented  to  the  network  and  the  entire  is  process  repeated. 

•  A  momentum  term  a  is  often  added  to  Equation  2.8  to  helps  the  network  skip  over 
small  local  minima  [Ref.  13]: 

AWij(n)  =  aAWji(n  - 1)  +  (1  -  a)ii5j(n)Ii(n)  (2.10) 

In  this  form  of  the  learning  rule,  each  weight  is  adjusted  during  epoch  n  by  an 
amount  equal  to  the  weighted  average  of  the  previous  weight  change  (epoch  n  - 1)  and  the  weight 
change  dictated  by  Equation  2.8.  Thus,  the  network  follows  a  running  average  of  the  gradient 
rather  than  just  the  local  gradient.  If  the  momentum  constant  is  zero,  this  degenerates  to  pure 
gradient  descent  (Equation  2.8).  If  the  constant  is  one,  the  gradient  is  ignored  and  the  last  weight 
adjustment  is  repeated  (not  very  useful).  A  constant  of  between  0.90  and  0.95  (usually  the  latter) 
was  used  in  this  research. 

The  training  paradigm  just  described  is  known  as  supervised  learning.  The  training 
algorithm  must  know  each  training  vector's  correct  output  class  in  order  to  perform  the  weight 
adjustments.  Unsupervised  learning  algorithms  also  exist.  But  they  are  less  common  (at  least  in 
artificial  neural  networks)  and  they  will  not  be  discussed  here. 

8.  Neural  Networks  Versus  Conventional  Programming  Techniques 

The  previous  examples  are  problems  that  could  easily  be  solved  with  the  kind  of 
conditional  statements  found  in  every  programming  language.  But  consider  a  program  to  process 
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loan  applications.  One  might  initially  think  that  it  would  be  possible  to  simply  apply  thresholds 
to  each  of  the  inputs  and  approve  the  applications  where  all  or  a  certain  number  of  inputs  exceed 
the  threshold.  For  example,  a  recent  bankruptcy  could  be  used  to  justify  an  immediate  rejection. 
But  what  if  the  applicant  now  has  a  six-figure  income?  Humans  deal  easily  with  situational 
thresholds  and  fuzzy  logic.  But  the  thought  processes  used  by  humans  don't  easily  reduce  to 
nested  if/then  statements.  For  this  reason,  the  average  loan  officer  would  probably  find  it 
difficult  to  write  an  unambiguous  set  of  instructions  describing  how  to  evaluate  loan  applications. 

Artificial  neural  networks  deal  easily  with  problems  like  this.  In  fact,  some  financial 
institutions  currently  screen  loan  applications  using  neural  networks.  The  networks  are  trained 
by  repeatedly  presenting  training  data  (such  as  actual  loan  applications)  and  comparing  the 
network's  output  with  the  "correct"  output  (such  as  an  indication  of  whether  the  applicant 
defaulted  on  the  loan).  If  the  network's  output  is  incorrect,  the  weights  that  contributed  to  the 
incorrect  output  are  reduced.  Training  then  resumes  until  the  error  rate  becomes  acceptable. 

9.  Artificial  Versus  Biological  Neural  Networks 

It  is  somewhat  traditional,  and  wise,  to  include  a  disclaimer  in  descriptions  of  artificial 
neural  networks.  Specifically,  we  must  remember  that  ANNs  bear  little  real  resemblance  to  the 
biological  neural  networks  that  inspired  them.  This  is  particularly  true  of  back-propagation 
neural  networks  (BPNNs).  Some  of  the  key  differences  include: 

•  BPNNs  operate  in  a  synchronous  feed-forward  manner.  All  inputs  are  sampled  at  the 
same  time.  The  first  layer's  weights  are  then  all  calculated  and  synchronously  applied 
to  the  next  layer's  inputs.  Biological  networks,  on  the  other  hand,  operate 
asynchronously.  A  biological  neural  network’s  neurons  integrate  their  inputs  over 
time,  firing  asynchronously.  The  resulting  activity  patterns  bear  little  resemblance  to 
the  orderly  lock-step  operation  of  ANNs. 

•  The  neurons  in  a  BPNN  layer  are  typically  fully  connected  with  preceding  layers  but 
do  not  receive  inputs  from  the  same  or  succeeding  layers.  Again,  some  ANNs  do 
employ  feedback  and  competition  within  layers.  But  the  majority  of  practical  ANN 
applications  have  used  the  feed-forward  architecture.  Biological  neural  networks,  on 
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the  other  hand,  often  employ  connections  that  imply  feedback  and  competition  within 
"layers."  Biological  networks  also  imply  much  more  varied,  and  far  less  symmetric, 
connection  strategies.  In  particular,  clusters  of  richly  interconnected  biological 
neurons  often  form  subnetworks,  a  very  different  strategy  from  the  monolithic 
fully-interconnected  ANN. 

•  The  supervised  learning  paradigm  described  earlier  is  biologically  implausible. 

Self-organizing  ANNs  have  been  built.  But  far  too  Uttle  is  known  about  the  mechanics 
of  human  learning  to  judge  how  close  those  ANNs  come  to  the  mechanisms  of  their 
biological  counterparts. 

This  disclaimer  is  certainly  not  intended  to  shght  the  importance  of  ANNs.  They  have 
proven  quite  useful  in  spite  of  their  simplicity.  And  it  may  never  be  necessary  to  exactly  mimic 
the  operation  of  the  brain.  After  all,  manmade  machines  seldom  exactly  duplicate  the  operation 
of  the  biological  structures  that  inspire  them.  Otherwise,  our  airplanes  would  have  feathers  and 
our  automobiles  would  travel  on  hooves. 

10.  Neural  Networks  and  Speech  Recognition 

The  commercial  speech  recognition  systems  described  earlier  are  quite  impressive 
compared  to  the  systems  of  just  a  few  years  ago.  But  no  existing  system  can  claim  to  have 
human-like  continuous  speech  recognition  abilities.  And  the  HMM  approach  has  disadvantages, 
some  theoretical  and  some  practical,  that  may  place  limits  on  its  long-term  usefulness.  One  such 
theoretical  disadvantage  is  the  necessary  assumption  that  the  features  extracted  within  a  phonetic 
segment  are  uncorrelated  with  each  other.  This  is  obviously  not  true  [Ref.  15].  A  more  practical 
disadvantage  is  that  large- vocabulary  HMMs  are  very  computationally  intensive,  as  evidenced  by 
the  hardware  requirements  described  earlier.  It  simply  may  not  be  possible  to  implement 
real-time  speaker-independent  large-vocabulary  continuous  speech  recognition  on  conventional 
(serial)  computers  using  HMMs  alone. 

With  the  HMM  perhaps  reaching  its  "Peter  principle"  limits,  many  researchers  have  turned 
to  the  artificial  neural  network.  Once  trained,  ANNs  have  the  potential  to  greatly  speed  up 
recognition  tasks  because  they  can  be  implemented  in  highly  parallel  hardware.  They  have  also 
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proven  to  be  well  suited  to  complex  pattern  recognition  tasks,  such  as  computer  vision  and 
speech  recognition. 

In  fact,  speech  recognition  is  prominent  in  nearly  every  author's  list  of  suitable  neural 
network  applications. 

In  the  last  ten  years,  hundreds  of  papers  have  been  written  describing  neural  network-based 
speech  research.  No  attempt  is  made  here  to  provide  a  comprehensive  summary  of  previous 
research.  But  the  following  papers  are  illustrative  of  the  scope  of  recent  research: 

•  Hirai  and  Waibel  developed  a  speaker-dependent  time-delay  neural  network  (TDNN*) 
for  Japanese  phoneme  spotting,  combined  with  a  higher  level  network  for  word 
recognition.  [Ref.  16] 

•  Wendell  and  Abdelhamied  used  modular  TDNNs  to  develop  a  phoneme  recognizor. 

In  a  modular  neural  network  the  recognition  problem  is  partitioned  into  subproblems 
and  subnetworks  are  trained  to  solve  particular  subproblems.  For  example,  one 
subnetwork  might  be  trained  to  identify  stop  consonants  and  another  vowels.  [Ref.  17] 

•  Matsuura,  Miyazawa,  and  Skinner  built  a  small  vocabulary  phoneme-based  neural 
network  word  recognizor.  [Ref.  18] 

•  Iso  and  Watanabe  investigated  the  use  of  demi-syllables  (described  later)  in  a 
large-vocabulary  speaker-dependent  Japanese  language  recognition  system.  [Ref.  19] 

•  Witbrock  and  Haffner  constructed  a  recognition  system  which  adapts  to  individual 
speakers  to  improve  recognition  accuracy.  A  Speaker  Voice  Code,  representing  a 
speaker's  individual  speech  characteristics,  was  presented  to  a  TDNN  along  with  more 
conventional  acoustic  inputs.  The  system's  accuracy  on  French  digits  was  in  excess  of 
99  percent.  [Ref.  8] 

•  Morgan  and  Bourlard,  and  others,  have  researched  the  use  of  hybrid  ANN/HMM 
architectures.  [Ref.  15] 

*  A  TDNN  neuron  receives  both  current  inputs  and  delayed  prior  inputs. 


•  Dalsgaard,  Anderson,  and  Jorgensen  applied  self-organizing  neural  networks  to  feature 
identification  in  continuous  speech.  [Ref.  20] 

•  Sivakumar  and  Robertson  showed  that  neural  network  consonant  recognition  could  be 
improved  by  using  formant  transitions  in  neighboring  vowels.  [Ref.  5] 

The  state  of  neural  network-based  speech  recognition  can  perhaps  best  be  summarized  by 
saying  that  the  field  is  young  but  promising. 

D.  SUMMARY 

This  chapter  has  summarized  some  of  the  fundamentals  of  speech  recognition  and  artificial 
neural  networks.  These  facts  will  be  referred  to,  and  expanded  on,  in  Chapter  HI.  In  particular, 
the  rationale  behind  the  research  methodology  is  explained  in  terms  of  these  fundamentals. 
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in.  METHODOLOGY  AND  RATIONALE 


A.  INTRODUCTION 

Chapter  HI  describes  the  specifics  of  how  this  research  was  conducted.  This  chapter  also 
offers  some  of  the  reasoning  behind  those  specifics.  In  particular,  the  following  sections  describe 
the  rationale  behind  the  choice  of  recognition  unit,  neural  network  feature  vector,  frequency 
normalization  method,  time  alignment  method,  neural  network  topology,  and  word  recognition 
algorithm. 

B.  CHOICE  OF  RECOGNITION  UNIT 

Probably  the  most  fundamental  decision  in  speech  recognition  system  design  is  the  choice 
of  recognition  unit.  Systems  can  be,  and  have  been,  designed  to  detect  words,  syllables, 
demi-syllables,  and  phonemes.  Another  recognition  unit,  the  diphone,  has  been  used  effectively 
in  speech  synthesis.  But  it  has  received  far  less  attention  in  speech  recognition.  The  diphone  is 
discussed  in  depth  later  in  this  section.  First,  the  alternative  recognition  units  are  described. 

1.  Words 

Small-vocabulary  word  recognizors  have  been  built.  However,  it  is  probably  impractical  to 
build  a  large-vocabulary,  speaker  independent  system  using  the  word  as  a  recognition  unit.  The 
most  obvious  problem  is  in  the  sheer  size  of  the  task.  The  system  would  be  required  to  recognize 
on  the  order  of  100,000  words.  In  order  to  achieve  speaker  independence,  the  system  would 
probably  have  to  train  on  hundreds  of  examples  of  each  word,  from  a  wide  variety  of  speakers. 
Just  assembling  the  training  and  test  data  would  be  a  daunting  task. 

Another  problem  with  words  is  in  their  length.  Longer  speech  segments  magnify  the 
impact  of  speaker  variabilities  such  as  speech  rate.  Also,  an  enormous  amount  of  storage  would 
be  required  to  hold  acoustic  templates  for  100,000  words.  The  situation  is  analogous  to 
assigning  a  unique  symbol  to  every  word  in  the  English  language  (rather  than  spelling  each  word 
with  combinations  of  26  letters).  If  humans  were  to  attempt  this,  we  would  no  doubt  have  to 
devote  a  great  deal  of  our  mental  storage  capacity  to  storing  images  of  those  symbols. 
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A  final  problem  is  in  segmentation.  As  mentioned  earlier,  words  acoustically  merge  in 
continuous  speech.  Word  recognition  systems  normally  get  around  this  problem  by  requiring 
discrete  utterances.  Continuous-speech  word  recognizors  would  have  to  spot  words  of  widely 
varying  length  in  a  continuous  speech  stream. 

2.  Syllable 

A  syllable  consists  of  a  vowel  (in  the  center)  and,  usually,  one  or  more  starting  and/or 
ending  consonants.  Syllables  are  attractive  because  they  are  relatively  easy  to  segment.  They  are 
identifiable  in  the  speech  envelope  by  the  large  amount  of  energy  (and  voicing)  associated  with 
the  vowel.  They  also  account  for  coarticulation  between  phonemes  in  various  phonetic  contexts 
(see  the  phoneme  paragraph  for  more  on  this  issue).  Unfortunately,  ambiguity  can  arise  in 
dealing  with  strings  of  consonants.  Specifically,  the  recognizor  must  decide  whether  to  group  the 
consonants  with  the  preceding  vowel  or  the  succeeding  one.  Another  problem  is,  again,  in  the 
size  of  the  search  space.  Though  fewer  in  number  than  words,  there  are  still  an  estimated  10,000 
syllables  in  the  English  language.  [Ref.  7] 

3.  Triphone 

The  triphone  is  a  combination  of  three  phonemes.  This  is  attractive  because  the  triphone 
would  also  account  for  coarticulation.  In  fact,  every  phoneme  could  be  modeled  in  every 
possible  phonetic  context.  But  the  numbers  are,  again,  quite  large.  If  we  use  the  45  phonemes  of 
Table  2-2,  plus  a  dummy  phoneme  to  represent  silence,  there  are  theoretically  46^  or  97336 
possible  triphones.  Undoubtedly,  far  fewer  combinations  actually  appear  in  English.  But  the 
number  is  still  rather  large,  probably  of  about  the  same  magnitude  as  the  number  of  syllables. 

4.  Demisyllable 

A  demisyllable  consists  of  half  of  a  syllable.  If  each  syllable  in  the  English  language  is 
divided  in  half  at  the  vowel,  the  10,000  English  syllables  can  be  reconstructed  from  the  resulting 
2(XX)  or  so  unique  demisyllables.  Thus,  the  number  of  required  recognition  units  can  be  reduced 
significantly  by  employing  demisyllables  instead  of  syllables.  As  mentioned  earlier,  some 
research  has  been  conducted  using  the  demisyllable  as  the  recognition  unit  [Ref.  19].  Though 
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very  promising,  the  demisyllable  requires  slightly  more  complicated  heuristics  than  some  of  the 
alternatives.  In  particular,  the  segmentation  algorithm  must  deal  with  consonant  clusters,  as 
described  earlier  for  syllables. 

5.  Phoneme 

The  phoneme  is  an  attractive  alternative  if  for  no  other  reason  than  the  small  numbers,  less 
than  50  in  English.  Also,  the  phoneme  is  tempting  because  of  the  large  body  of  knowledge  and 
theory  behind  it.  Phonemes  are  appealing  as  well,  from  a  practical  viewpoint,  because  a  large 
number  of  tools,  including  phonetically  transcribed  speech  databases  and  phonetic  dictionaries, 
have  been  built  around  the  phoneme.  Thus,  the  phoneme  is  the  most  commonly  used  sub-word 
recognition  unit.  Unfortunately,  the  phoneme  is  much  more  a  theoretical  unit  of  speech  than  a 
practical  one.  As  mentioned  in  the  section  on  coarticulation,  a  given  phoneme's  acoustic 
properties  depend  greatly  on  its  phonetic  context.  Thus,  a  recognition  system  must  force  some 
100  to  200  allophones  [Ref.  7]  into  50  or  so  phoneme  classes.  Another  challenge  is  in 
segmentation.  In  general,  there  is  no  sharp  dividing  line  between  phonemes  in  flowing  speech. 
This  causes  problems  both  in  training  and  during  recognition  (this  issue  is  discussed  in  more 
detail  in  the  next  section). 

C.  THE  DIPHONE 

The  final  sub-word  speech  unit  to  be  discussed  is  the  diphone,  the  recognition  unit  selected 
for  use  in  this  thesis  research.  A  diphone  consists  of  two  phonemes.  Theoretically,  there  are  46^ 
or  2116  possible  combinations  of  the  diphones  in  Table  2-1  (a  dummy  phoneme  is  again  used  to 
represent  silence).  But  not  all  of  these  actually  occur  in  English.  The  6229  entry  TIMIT  lexicon 
used  in  this  research  contains  only  1195  diphones.  And  the  100,000  word  Carnegie  Mellon 
phonetic  dictionary  contains  only  1331  distinct  diphones*.  Furthermore,  these  diphones  are  by  no 
means  evenly  distributed.  As  indicated  in  Figure  3.1,  some  diphones  are  far  more  common  than 
others.  In  fact,  the  ninety  most  common  diphones,  just  7.5  percent  of  the  total,  account  for  over 
50  percent  of  the  diphone  occurrences  in  the  TIMIT  lexicon.  Similarly,  the  490  most  common 

*  These  numbers  include  all  diphones  formed  by  pairing  phonemes  with  a  period  of  silence, 
as  at  the  beginning  or  end  of  an  isolated  word.  The  numbers  were  obtained  using 
DIPCOUNT.ADA,  a  utility  contained  in  Appendix  B. 
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diphones,  41  percent  of  the  total  diphones,  account  for  over  90  percent  of  the  diphone 
occurrences.  That  means  that  fairly  large  vocabularies  can  be  built  from  far  fewer  than  1220  or 
1300  diphones. 


Cumulative  Distribution  of  (1195)  Diphones  in  limit  Lexicon 


Figure  3.1 .  Cumulative  distribution  of  diphones  in  the  TIMIT  lexicon.  Although  1 1 95  diphones  exist  in  the 
lexicon,  some  are  far  more  common  than  others.  Thus,  fairly  large  vocabularies  can  be  constructed  with 
far  fewer  diphones. 

The  remaining  paragraphs  of  this  section  describe  additional  advantages  of  diphone-based 
recognition. 


1.  Coarticulation 

The  junction  between  two  phonemes  constituting  a  diphone  is  acoustically  influenced  by 
both  phonemes.  To  a  phoneme  recognizor  these  coarticulation  effects  are  a  curse.  But  to  a 
diphone  recognizor  this  transition  region  is  the  key  to  recognition.  As  mentioned  earlier,  vowels 
are  relatively  easy  to  recognize  due  to  their  characteristic  formant  frequencies.  But  consonants 
are  less  recognizable  in  isolation.  This  issue  was  specifically  investigated  by  Sivakumar  and 
Robertson  [Ref.  5],  who  showed  that  formant  transitions  in  neighboring  vowels  could 
significantly  increase  accuracy  in  consonant  recognition  (especially  stop  consonants).  Such  an 
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advantage  is  particularly  important  in  view  of  the  fact  that  consonants  actually  carry  most  of  the 
information  in  speech*.  As  further  evidence  of  their  potential,  diphones  have  already  proven 
useful  in  speech  synthesis  [Ref.  7].  If  diphones  lead  to  more  natural  sounding  synthesized 
speech,  it  is  reasonable  to  assume  that  diphone  models  would  more  closely  match  actual  speech 
during  recognition. 

2.  Redundancy  and  Probability  of  Detection 

Each  diphone  carries  twice  as  much  information  as  a  phoneme.  To  understand  why  this  is 
useful,  consider  a  hypothetical  diphone  recognizor  and  a  hypothetical  phoneme  recognizor,  both 
with  85  percent  recognition  accuracy.  For  the  sake  of  comparison,  assume  that  both  recognizors 
are  uniformly  accurate  for  all  recognition  units  and  that  the  error  probabilities  between  adjacent 
speech  units  are  independent**.  For  the  word  "robot,"  consisting  of  five  phonemes  (R  OW  B  AA 
T),  the  probability  of  a  phoneme  recognizor  spotting  the  word  (assuming  no  missing  phonemes 
can  be  tolerated)  would  be: 

p(word  spotted  I  phoneme  recognizor)  =  .85®  =  .44  (3-1) 

The  same  five  phoneme  word  would  be  described  with  six  diphones  (HX-R  R-OW 
OW-B  B-AA  AA-T  T-HX***).  But  each  diphone  carries  two  phonemes.  So  two  adjacent 
diphones  would  have  to  be  missed  to  cause  the  word  to  be  missed.  In  this  case,  the  probability  of 
word  detection  is  given  by: 

p(word  I  diphones)  =  .85®  +  6(.85®  .15’)  +  10(.85'‘  .15®)  +  4(.85®  .15®)  =  .90  (3-2) 

The  formula  in  Equation  3.2  was  reached  by  counting  the  21  success  modes  out  of  the  2® 
or  64  possible  outcomes.  Only  one  outcome  consisted  of  six  correct  diphones  (contributing  .85® 

*  This  can  easily  be  verified  by  removing  the  vowels  from  a  typical  sentence.  The  result 
will  be  much  more  understandable  than  the  same  sentence  with  all  consonants  removed. 

**  Both  assumptions  are,  of  course,  incorrect.  Some  diphones  and  phonemes  will  enjoy 
lower  error  rates  and  "burst  errors"  are  very  likely  to  occur  in  continuous  speech. 

***  The  symbol  "HX"  is  used  throughout  this  thesis  as  a  dummy  phoneme  consisting  of  some 
period  of  silence. 
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to  the  overall  probability).  Six  outcomes  included  five  correct  and  one  incorrect  diphone  (adding 
6[.85*  .15'  ]  to  the  overall  probability).  Similarly,  there  were  10  outcomes  with  4  diphones 
correct  and  2  incorrect;  and  four  outcomes  with  three  diphones  correct  and  three  incorrect. 

Thus,  in  this  example  and  given  identical  sub-word  recognition  accuracies,  the  diphone 
recognizor  is  theoretically  capable  of  more  than  twice  the  word  recognition  accuracy  of  a 
phoneme  recognizor. 

3.  Search  Space  Reduction 

The  diphone  offers  one  final  advantage  as  a  recognition  unit.  This  last  advantage  becomes 
evident  when  considering  the  lexical  search  that  might  result  from  detecting  a  given  speech  unit 
in  the  speech  stream.  Again,  consider  a  hypothetical  phoneme  recognizor  and  a  hypothetical 
diphone  recognizor.  If  the  phoneme  recognizor  detects  what  appears  to  be  an  R  in  the  speech 
stream,  the  lexical  analyzer  must  consider  all  341  words  in  the  TIMIT  lexicon  that  begin  with 
that  phoneme.  And  enough  spurious  phonemes  will  probably  be  detected  after  that  to  keep  most 
of  those  words  in  the  search  tree. 

Contrast  this  situation  with  a  diphone  recognizor  that  has  detected  the  R-OW  diphone  in 
the  input  stream.  Only  20  words  in  the  TIMIT  lexicon  begin  with  that  diphone.  And  not  just  any 
B  in  the  input  stream  will  be  accepted  in  the  lexical  analyzer's  consideration  of  the  word  "robot." 
The  B  must  come  in  the  form  of  an  OW-B  or  B-AA  diphone.  Thus,  diphones  should  reduce  the 
lexical  analyzer's  burden  because  the  search  tree  can  be  pruned  much  more  quickly. 

D.  FEATURE  VECTOR 

A  critical  decision  facing  the  neural  network  designer  is  the  choice  of  input,  or  feature, 
vector.  One  cannot  expect  a  neural  network,  or  any  other  tool,  to  perform  pattern  recognition 
well  unless  relevant  inputs  are  provided.  For  example,  a  neural  network  provided  with  the  height 
and  weight  of  applicants  could  not  be  expected  to  accurately  predict  loan  defaults. 

In  addition  to  carefully  choosing  the  ANN's  inputs,  it  is  often  desirable  to  pre-process 
inputs  to  make  the  neural  network's  recognition  task  as  straightforward  as  possible.  In  speech 
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recognition,  the  most  obvious  example  of  pre-processing  is  transformation  from  the  time  domain 
to  the  frequency  domain. 

The  following  subsections  describe  the  inputs  selected  for  this  research,  the  preprocessing 
steps  used,  and  the  rationale  behind  their  selection. 

1.  Differenced  Cepstral  Coefficients 

Formant  peaks  can  be  difficult  to  identify  in  conventional  spectra,  as  is  evident  from  Figure 
3.2.  Cepstral  smoothing  is  a  method  frequently  used  to  make  the  formant  peaks  more  easily 
identifiable.  Cepstral  smoothing  is  explained  in  Figure  3.3  [after  Ref.  1]. 


FFT(voice)  =  FFT(pitch)  X  Fh'l  (articulators) 


log(hFl  (voice))  =  log(FFT(pitch))  +  log(FFT(articulators)) 


IFFr(log(FFT(voice))  =  Pitch  Effects  +  Articulator  Effects 


Figure  3-3.  An  intuitive  explanation  of  cepstral  smoothing  of  a  voice  signal  [after  Ref.  1].  The  top  row 
shows  that  a  voice  signal  in  the  conventional  frequency  domain  is  a  nonlinear  combination  (product)  of 
vocal  fold  excitation  effects  (pitch  “combing")  and  articulator  effects  (the  "spectral  envelope").  The  two 
components  are  difficults  to  separate.  But  the  logarithm  of  this  product  is  the  sum  of  the  logarithms  of  the 
components  (middle  row).  Thus,  an  inverse  Fourier  transform  can  be  used  to  separate  the  pitch  and 
articulator  effects  in  the  "quefrency  domain."  The  bottom  row  shows  a  cepstral  plot  of  a  voice  signal.  In 
the  cepstrum,  the  articulator  effects  are  concentrated  at  the  low  quefrency  end  and  pitch  effects  dominate 
at  high  quefrencies.  A  low-time  lifter  can  be  used  to  remove  the  pitch  peak  and  its  "rahmonics."  A  final 
Fourier  transform  then  produces  the  smoothed  spectrum  at  the  right  end  of  the  middle  row. 
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In  MATLAB,  the  cepstrum*  can  be  computed  with  the  following  function  calls; 


cepstrum  =  ifft(log(abs(fft(signal  .*  hamming(512)))));  (3.3) 

where  "signal"  is  a  PCM  speech  vector**,  "ifft"  is  the  MATLAB  inverse  fourier  transform 
function,  and  "fft"  is  the  fast  fourier  transform. 

Figure  3.4  is  a  cepstral  version  of  Figure  3.2.  The  low  quefrency  components  are  due  to 
articulator  effects.  The  speaker's  pitch  is  evident  in  the  series  of  high  quefrency  spikes.  A  low 
time  lifter  is  used  to  remove  the  pitch  effects  (the  dotted  line  in  Figure  3.4  shows  the  coefficients 
of  the  lifter  used)  and  a  final  FFT  produces  the  smoothed  spectrum  of  Figure  3.5.  In  MATLAB 's 
script  language,  these  final  steps  could  be  accomplished  by: 

smoothed_spectrum  =  fft(  cepstrum  .*  lifter );  (3.4) 

where  "lifter"  is  a  vector  of  low  time  lifter  coefficients. 

A  512  point  FFT  is  used  to  compute  the  cepstral  coefficients  used  in  the  actual  feature 
surface.  The  samples  are  Hamming  windowed  and  each  window  position  overlaps  the  previous 
window  position  by  approximately  one  third  (170  samples  or  about  10.6  milliseconds  at  the  16 
KHz  sample  rate).  Figure  3.6  illustrates  the  windowing  strategy.  Note  that  the  term  "sample 
window"  is  used  throughout  this  thesis  to  refer  to  the  512  point  FFT  sample  window.  The  term 
"recognition  window"  is  used  to  refer  to  the  longer  window  from  which  the  overall  feature  vector 
is  produced.  A  number  of  overlapped  sample  windows  are  required  to  constitute  a  single 
recognition  window. 


*  The  cepstrum  domain  is  not  the  frequency  domain.  So  unique  terms  have  been  invented 
to  deal  with  it.  The  most  commonly  used  terms  include  quefrency  (modification  of  the  word 
frequency),  lifter  (from  filter),  rahmonics  (from  harmonics),  and  cepstrum  itself  (from  spectrum). 
[Ref.  1] 

**  In  MATLAB's  script  language,  the  operator  means  to  multiply  each  element  in  one 
vector  by  the  corresponding  element  in  the  other  vector.  The  hamming(n)  function  returns  a 
n-point  hamming  window  vector. 
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Quelrency 


Figure  3.4.  A  cepstral  version  of  Figure  3.2.  The  prominent  spike  near  the  center  of  the  quefrency  scale 
is  a  pitch  peak.  Articulator  effects  are  concentrated  at  the  left  end  of  the  quefrency  scale.  The  dotted 
outline  illustrates  the  location  of  the  low-time  lifter  cutoff. 


Spectrum  of  lY  Vowel  After  Cepstral  Smoothing 


Figure  3.5.  The  spectrum  of  Figure  3.2  after  cepstral  smoothing.  Note  that  the  formant  peaks  are  now 
easier  to  identify. 
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Figure  3.6.  Windowing  strategy  used  by  the  diphone  recognizor.  The  recognition  window  is  composed  of 
a  sequence  of  overlapped  FFT  (512  point)  sample  windows. 

Figure  3.7  is  an  example  of  the  feature  surface  used  in  this  research.  Each  row  is  produced 
from  a  single  sample  window  position.  Twenty-nine  columns  of  each  row  in  the  feature  surface 
are  devoted  to  cepstral  information.  The  thirtieth  colunrn  contains  information  related  to  the 
signal's  short-time  power  (explained  in  more  detail  later  in  this  section).  The  30  columns  of  the 
feature  surface  are  ultimately  sequenced  into  a  single  feature  vector  for  presentation  to  the  neural 
network.  Later  paragraphs  describe  the  number  of  feature  surface  rows  used  in  a  feature  vector 
(its  length  in  the  time  domain). 

A  speaker's  volume  can,  of  course,  effect  the  magnitude  of  the  spectral  peaks.  To  reduce 
the  impact  of  this  potentially  confounding  variable,  each  cepstral  coefficient  in  the  feature  vector 
actually  represents  the  difference  between  that  time  period's  coefficient  and  the  coefficient  of  the 
previous  time  step.  This  is  analogous  to  the  use  of  capacitive  coupling  to  remove  a  direct  current 
offset  from  an  oscilloscope's  input.  The  delta  or  differenced  cepstrum,  as  this  technique  is  called, 
has  been  used  with  some  success  in  previous  research  [Ref.  1].  It  seems  especially  appropriate 
for  diphone  recognition,  where  the  dynamic  aspects  of  speech  are  relevant,  rather  than  annoying. 
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Figure  3.7.  An  example  of  the  feature  surface  used  for  diphone  recognition.  The  final  design  included  13 
rows  (about  289  ms)  by  30  columns  (29  cepstrally-smoothed,  differenced  frequency  values  and  one 
differenced  RMS  amplitude  measure).  The  columns  were  sequenced  into  a  single  (390  element)  feature 
vector  for  presentation  to  the  neural  network. 


2.  Critical  Bands 


The  512  point  FFT  returns  spectral  information  in  256  "frequency  bins"  of  31.25  Hz  per 
bin  (at  a  sampling  rate  of  16  KHz),  Even  if  it  were  practical  to  include  256  inputs  for  each  32 
miUisecond  slice  of  speech,  it  would  not  be  desirable.  The  resulting  feature  vector  would 
probably  exhibit  too  much  granularity.  In  particular,  FI  frequencies  (for  the  same  vowel) 
differing  by  just  20  or  30  Hz  would  trigger  entirely  different  inputs.  On  the  other  hand,  evenly 
dividing  the  8000  Hz  spectrum  between  the  29  columns  of  Figure  3.7  would  give  each  input  a 
bandwidth  of  276  Hz.  The  resulting  loss  in  resolution  would  often  cause  the  FI  frequencies  of 
entirely  different  vowels  to  merge  into  a  single  input. 
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This  dilemma  is  often  resolved  through  the  use  of  critical  band  filters.  Psychoacoustic 
research  has  shown  that  human  perception  of  a  given  frequency  is  influenced  by  a  critical  band  of 
neighboring  frequencies  [Ref.  7].  At  lower  frequencies,  the  critical  bands  are  narrower.  As  a 
result,  the  human  auditory  system  exhibits  finer  resolution  at  lower  frequencies. 

In  an  effort  to  mimic  critical  bands,  the  FFT's  output  bins  were  unevenly  distributed 
between  the  29  columns  of  each  feature  surface  row.  Figure  3.8  shows  the  bandwidth  of  each  of 
the  29  differenced  cepstral  inputs  of  the  feature  surface  colunms.  The  exact  frequency 
assignments  can  be  found  in  Appendix  A  (the  FEATURES.M  script  file).  The  bandwidth  for 
each  center  frequency  was  computed  using  the  following  approximation  [Ref.  22]: 

BWj  =  25  +  75  [  1  +  1.4(  Fc/1000  f  (3.5) 

where  is  the  center  frequency. 


Critical  Frequency  Bands  | 
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Figure  3.8.  Critical  frequency  bands  used  in  computation  of  feature  surface.  At  low  frequencies,  fewer 
FFT  frequency  coefficients  are  averaged  to  produce  the  feature  surface  column  values.  Thus,  spectral 
resolution  is  finer  at  low  frequencies  (as  it  is  for  human  hearing). 
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Each  center  frequency  was  computed  using  the  recursive  formula: 

FCj  =  START,  +  0.75  (BW,)  +  (BW,  /  2)  (3.6) 

where  START;  is  the  previous  band's  start  frequency. 

This  results  in  an  overlap  between  bands  ranging  from  about  25  percent  at  the  low 
frequency  end  to  about  29  percent  at  the  high  end.  The  first  band's  center  frequency  was  defined 
to  be  50  Hz  with  a  bandwidth  of  100  Hz.  The  actual  value  used  for  a  given  feature  surface 
colunm  is  the  mean  of  the  contents  of  the  FFT  frequency  bins  falling  in  that  column's  bandwidth. 

3.  Short-Time  Power  Differences 

Although  the  frequency  domain  is  more  useful  in  many  cases,  it  does  not  seem  wise  to 
completely  ignore  the  time  domain.  As  one  example  of  its  value,  recall  that  the  speech  envelope 
offers  important  clues  to  the  presence  of  stop  consonants.  Thus,  the  thirtieth  column  of  each 
feature  surface  row  originally  contained  a  measure  of  the  differenced  short-term  power  [Ref.  1] 
for  that  column's  sample  window  position.  For  the  (n-long)  sample  window  at  position  k: 

AP(k)  =  {(Ss2(t))/n)-AP(k-l)  (3.7) 

where  s\t)  is  the  square  of  the  amplitude  of  the  voice  signal  at  time  t. 

As  explained  in  Chapter  IV,  a  better  choice  would  have  been  the  differenced  RMS 
amplitude: 

AARMs(k)  =  -  AARMs(k-  1)  (3.8) 

4.  Recognition  Window  Length 

So  far,  no  mention  has  been  made  of  the  length  of  the  recognition  window.  This  is  actually 
a  difficult  question.  Figure  3.9  is  a  histogram  of  maximum  durations  of  diphones  in  the  TIMIT 
SX  and  SI  sentences  (described  later).  Since  the  majority  of  diphones  last  no  more  than  500 
milliseconds,  the  initial  version  of  the  feature  surface  contained  23  rows,  with  each  containing 
spectral  and  short-time  power  information  from  a  single  512  point  (32  millisecond)  sample 
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window.  Because  the  windows  were  overlapped  by  one  third,  the  recognition  window  was 
approximately  502  milliseconds  wide  (8036  samples). 


Diphone  Durations  Histogram  (TIMIT  SX  and  SI  Sentences) 


Figure  3.9.  Histogram  of  durations  of  diphones  in  the  TIMIT  SX  and  SI  sentences. 

It  is  clear  from  Figure  3.9  that  the  vast  majority  of  diphones  are  too  short  in  duration  to  fill 
such  a  large  recognition  window.  It  was  hoped  that  the  neural  network  would,  during  training, 
reduce  the  weights  of  the  outlying  rows  for  short  diphones.  As  explained  in  Chapter  IV,  it  later 
became  clear  that  the  recognition  window  size  should  be  reduced  to  approximately  289 
milliseconds  (4616  samples).  This  is  actually  not  surprising,  in  that  Kamm  and  Singhal  found  an 
input  span  of  about  125  milliseconds  worked  best  for  phoneme  recognition  [Ref.  23],  and  a 
diphone  consists  of  two  phonemes. 

5.  Precomputation  of  Feature  Vectors 

Before  leaving  the  topic  of  feature  vector  design,  it  should  be  mentioned  that  all  of  the 
feature  vectors  used  for  network  training  and  testing  were  precomputed  and  saved  in  MATLAB 
data  files.  Each  feature  vector  requires  approximately  800,000  to  one  million  floating  point 
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operations  to  compute.  And  the  same  feature  vectors  were  used  during  each  of  thousands  of 
passes  through  the  data  (by  hundreds  of  networks).  It  would  have  been  very  inefficient  to 
recompute  the  feature  vectors  from  the  raw  speech  waveforms  during  each  pass  through  the  data 
(up  to  20,190  feature  vectors  per  pass). 

E.  PITCH-BASED  FREQUENCY  NORMALIZATION 

As  mentioned  earlier,  it  is  desirable  to  remove  from  the  feature  vector  any  features  that  are 
not  relevant.  The  speaker's  pitch  or  fundamental  frequency  is  such  a  feature.  Pitch  is  a  much 
better  indicator  of  the  gender  (or  age)  of  the  speaker  than  of  what  is  being  spoken. 

As  mentioned  in  Chapter  n,  formant  frequencies  and  pitch  tend  to  be  inversely  proportional 
to  vocal  tract  length.  Several  researchers;  most  notably  Gersten,  Wakita,  and  Neuburg;  have 
attempted  to  normalize  formant  frequencies  of  individual  speakers  to  improve  recognition 
accuracy  [Ref.  3].  Wakita's  effort  is  especially  interesting.  He  used  measured  formant  data  to 
estimate  the  speaker's  vocal  tract  length.  He  then  normalized  the  formant  frequencies  based  on 
the  vocal  tract  length  estimate.  Normalization  reduced  a  vowel  recognizor's  error  rate  from  21  to 
15.6  percent.  Neuburg  also  attempted  various  scalings  of  the  firequency  scale  based  on  the 
measured  formant  frequencies. 

Unfortunately,  the  methods  described  in  [Ref.  3]  depend  on  identification  of  the  speaker's 
formant  frequencies  and  then  normalization  of  those  frequencies.  These  techniques  add  to  the 
complexity  of  the  preprocessing,  they  are  difficult  to  implement  in  real  time,  and  errors  in 
formant  identification  could  be  expected  to  reduce  accuracy. 

Because  both  pitch  and  formant  frequencies  are  inversely  proportional  to  vocal  tract  length, 
it  seems  reasonable  to  use  pitch  itself  as  the  basis  for  formant  normalization.  Therefore,  an  early 
part  of  this  research  was  the  measurement  of  pitch  and  formant  frequencies  for  a  variety  of 
speakers.  The  results  of  this  investigation  are  described  in  Chapter  IV.  For  now,  it  is  sufficient 
to  say  that  pitch  does  appear  to  be  a  useful  basis  for  normalization.  Accordingly,  the  feature 
vectors  used  during  this  research  were  frequency-normalized  using  the  procedure  described  in 
Chapter  IV. 


52 


F.  TIME  ALIGNMENT  USING  NEURAL  NETWORK  FEEDBACK 


Because  of  the  dynamic  nature  of  diphones,  correct  time-alignment  with  the  recognition 
window  is  doubly  important.  One  cannot  expect  a  neural  network  to  easily  see  the  similarities  in 
time-shifted  versions  of  a  rapidly  changing  waveform.  Fortunately,  the  TIMIT  speech  data  used 
in  this  research  includes  time-aligned  phonetic  transcriptions  that  can  be  used  to  (approximately) 
center  the  diphone's  phoneme  junction  in  the  recognition  window.  But  the  phonetic 
transcriptions  are  based  on  a  semi-automated  process  of  labeling  phoneme  boundaries  that  are 
themselves  usually  quite  indistinct  [Ref.  24],  Figure  3.10  illustrates  that  variability  exists,  even 
in  diphones  with  sharply  defined  features,  such  as  stop  consonants.  So  it  seems  likely  that 
recognition  accuracy  could  be  improved  by  more  accurate  time-alignment. 


Short  Term  RMS  Amplitude  Difference  -vs- Time  forT  IH 


Figure  3.10.  RMS  amplitude  columns  of  several  feature  surfaces  for  the  same  diphone  (different 
speakers).  Notice  the  variability  in  time  alignment  (based  on  TIMIT  phonetic  transcriptions). 

Unfortunately,  it  is  not  easy  to  determine  just  what  recognition  window  position  is  correct. 

One  could  conceivably  compare  the  speech  envelope  with  some  diphone-specific  template  and 

use  the  window  position  that  minimized  the  sum-squared  error.  But  it  would  be  necessary  to 
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de-emphasize  the  tails  of  the  window  when  aligning  short  diphones.  Also,  the  process  of 
developing  templates  would  be  challenging  and  error  prone.  Finally,  the  usefulness  of  such  an 
approach  might  be  hampered  by  the  fact  that  not  all  important  features  are  evident  in  the  time 
domain.  For  example,  voice  onset  time  (period  of  time  between  a  stop's  release  and  the 
resumption  of  voicing)  is  a  key  feature  for  distinguishing  voiced  plosives  from  unvoiced 
plosives.  Figure  3.11  shows  a  plot  of  one  spectral  band  of  the  same  diphones  shown  in  Figure 
3.10.  Note  that  this  band  shows  even  greater  variability  than  the  RMS  amplitude  plot. 


One  Cepstral  Difference  Column  -vs- Time  forT  IH 


Figure  3.1 1 .  One  feature  surface  cepstral  difference  column  for  same  speech  samples  as  Figure  3.1 0. 

The  time-alignment  method  used  in  this  research  makes  use  of  feedback  from  the  neural 
network  itself.  Specifically,  the  neural  network  is  trained  using  transcription-based  alignments 
until  test  set  errors  appear  to  have  reached  a  minimum.  Then,  each  feature  vector  is  shifted 
through  the  recognition  window  until  the  correct  output  is  maximized.  The  window  position  that 
maximizes  the  correct  output  is  used  to  perform  the  weight  adjustments.  With  this  approach  in 
mind,  the  precomputed  feature  surface  files  contained  27  rows.  This  allowed  the  initial  23  row 
recognition  window  to  be  shifted  left  or  right  by  up  to  two  time  steps  (about  43  milliseconds). 

A  second  feature  surface  was  also  computed  for  each  diphone.  The  second  feature  surface 
was  computed  by  shifting  the  recognition  window  to  a  start  position  about  10  milliseconds  earlier 
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in  the  speech  sample.  This  meant  that  the  two  feature  surfaces  were  interleaved  in  time  (in  case 
20  millisecond  time  steps  were  too  coarse).  As  a  result,  the  effective  temporal  resolution  was 
about  10  milliseconds.  Because  the  time-alignment  method  evolved  considerably  during  the 
conduct  of  this  research,  further  details  are  deferred  until  Chapter  IV. 

G.  NEURAL  NETWORK  TOPOLOGY 

Neural  network  design  is  still  more  an  art  than  a  science.  There  are  few  generally  accepted 
theories  to  guide  the  designer  of  networks  that  must  be  used  on  real-world  problems.  Instead, 
researchers  must  often  resort  to  rules  of  thumb  and  experimentation.  In  particular,  it  can  be 
difficult  to  determine  the  number  of  hidden  nodes  required  to  solve  a  non-trivial  problem.  This 
section  describes  the  architecture  selected  for  this  research,  as  well  as  the  justification  for  that 
selection. 

1.  Single  Hidden  Layer  Versus  Two  Hidden  Layers 

One  key  architectural  question  is  the  number  of  hidden  layers  to  use  in  the  neural  network. 
Homik,  Stinchcombe,  and  White  proved  in  1989  that  a  neural  network  with  a  single  hidden  layer 
and  sigmoidal  activation  function*  can  approximate  any  mapping  to  any  degree  of  accuracy  given 
enough  nodes  in  the  hidden  layer  [Ref.  25].  Though  their  existence  proof  offers  no  advice  on  the 
number  of  hidden  nodes  required,  it  is  encouraging  to  know  that  a  single  hidden  layer  can 
theoretically  handle  any  mapping.  Nevertheless,  other  researchers  have  shown  that  two  hidden 
layers  will  perform  better  (fewer  nodes  and  better  accuracy)  on  problems  where  the  output 
classes  cannot  be  separated  into  convex**  open  or  closed  regions  [Ref.  26].  Some  texts  have  even 
referenced  Homik's  proof  on  one  page,  while  stating  (incorrectly)  on  another  that  a  single  hidden 
layer  is  only  sufficient  if  the  decision  boundaries  are  convex. 

While  the  theoretical  arguments  seem  to  favor  two  hidden  layers  on  complex  mappings, 
practitioners  often  argue  for  a  single  hidden  layer  [Ref.  27].  Because  it  is  impractical  to 
determine  whether  diphone  classes  can  be  bounded  by  convex  regions  in  an  input  space  that  has 

*  In  1991  Homik  removed  the  requirement  that  the  transfer  function  be  sigmoidal.  It  is 
only  necessary  that  is  be  sufficiently  smooth. 

**  A  convex  region  is  a  region  in  which  any  two  points  in  the  region  can  be  connected  by  a 
line  that  does  not  leave  the  region  [Ref.  12]. 
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several  hundred  dimensions,  a  simple  experiment  was  conducted  in  an  effort  to  shed  light  on  the 
convexity  issue.  Figure  3.12  shows  a  problem  with  three  output  classes.  Note  that  none  of  the 
three  classes  can  be  bounded  by  a  convex  region  without  including  members  of  another  class. 
This  problem  was  solved  with  a  back-propagation  neural  network  consisting  of  two  inputs,  10 
hidden  neurons,  and  three  output  neurons.  Figure  3.13  shows  the  classification  lines  defined  by 
the  trained  hidden  layer's  weights.  The  sum-squared  error  (SSE)  after  755  epochs  (an  epoch  is 
one  complete  pass  through  the  training  data)  was  less  than  10  ®.  While  this  example  does  not 
prove  that  a  single  hidden  layer  is  sufficient  for  every  problem,  it  does  disprove  the  notion  that 
networks  with  a  single  hidden  layer  are  incapable  of  solving  all  problems  with  non-convex 
decision  boundaries. 
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Problem  with  3  Output  Classes  &  Non-Convex  Decision  Regions 
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Figure  3.12.  A  classification  problem  with  three  output  classes  that  cannot  be  separated  using  convex 
decision  regions.  A  neural  network  with  a  single  hidden  layer  proved  capable  of  handling  this  problem. 


*  A  sum-squared  error  of  zero  would  require  that  the  correct  neuron's  output  be  a  one  and 
all  other  ouq)uts  zero.  In  practice,  it  is  usually  only  necessary  that  the  correct  output  be  the 
strongest  for  correct  classification.  In  this  example,  all  input  vectors  were  correctly  classified 
long  before  epoch  755.  The  SSE  error  goal  of  10  ®  was  arbitrarily  selected. 
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Figure  3.13.  Classification  lines  defined  by  10  sets  of  hidden  layer  weights.  The  output  layer’s  input 
space  has  10  dimensions  so  it  cannot  be  plotted. 


For  comparison  purposes,  several  networks  with  two  hidden  layers  were  trained  on  the 
same  problem.  After  some  experimentation,  it  was  determined  that  at  least  five  neurons  in  the 
first  hidden  layer  and  five  neurons  in  the  second  hidden  layer  were  required  to  solve  the  same 
problem.  While  this  is  the  same  number  of  hidden  nodes  as  the  network  with  just  one  hidden 
layer,  the  new  network  required  5436  epochs  to  reach  the  same  error  goal.  Thus,  networks  with 
two  hidden  layers  are  apparently  not  always  superior,  even  with  non-convex  decision  boundaries. 
It  should  also  be  noted  that  this  second  network  was  constructed  only  after  it  was  known  that  10 
neurons  were  sufficient  for  a  single  hidden  layer.  Armed  with  this  knowledge,  and  the 
assumption  that  two  hidden  layers  would  require  a  total  of  nearly  that  many  neurons,  it  was  a 
fairly  simple  matter  to  come  up  with  a  working  design. 


Although  two  hidden  layers  may  be  better,  at  least  theoretically,  for  some  problems,  a 
network  with  just  one  hidden  layer  has  an  important  practical  advantage.  Specifically,  it  is  only 
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necessary  to  determine  the  size  of  one  layer.  The  size  of  the  feature  vector  and  the  number  of 
output  neurons  can  be  derived  from  the  nature  of  the  problem.  But  experimentation  is  the  only 
sure  way  to  determine  the  optimal  number  of  hidden  neurons.  In  the  example  problem  just 
described,  the  network  with  a  single  hidden  layer  was  first  trained  with  32  hidden  nodes.  The 
hidden  layer  was  then  reduced  in  size  until  the  network  failed  to  converge  (at  9  or  fewer 
neurons).  The  network  with  two  hidden  layers  was  actually  harder  to  size  and  test  because  two 
hidden  layers  had  to  be  sized. 

Because  the  networks  used  in  this  research  required  up  to  five  weeks  of  training  time  and 
trillions  of  floating  point  operations,  it  was  important  to  minimize  the  number  of  network 
topologies  to  be  tested.  Primarily  for  this  reason,  the  neural  networks  used  in  the  remainder  of 
this  research  consisted  of  just  one  hidden  layer. 

H.  TRANSLATING  DIPHONES  TO  WORDS 

Whether  using  the  diphone  or  the  phoneme  as  a  recognition  unit,  high  accuracy  at  the 
sub-word  level  is  no  guarantee  of  similar  accuracy  at  the  word  level.  So  a  prototype  lexical 
analyzer  was  constructed,  using  Common  LISP,  to  determine  if  the  assumed  advantages  of 
diphones  were  real.  The  following  approach  is  used: 

•  Two  interleaved  recognition  windows  are  stepped  through  the  PCM  speech  file*,  each 
moving  21  milliseconds  (342  samples)  per  time  step.  The  resulting  temporal 
resolution  is  about  10  milliseconds  (170  samples). 

•  For  each  recognition  window  position,  the  three  output  neurons  with  the  strongest 
outputs  are  identified.  A  text  line  is  then  appended  to  the  output  file,  listing  the 
identities  and  output  values  of  these  neurons,  as  well  as  the  current  window  position 
(to  serve  as  a  "time  stamp"). 

•  A  Common  LISP  program  reads  in  the  neural  network's  output  as  a  LISP  list.  In  cases 
where  a  given  diphone  appears  in  the  one  of  the  top  three  positions  for  more  than  one 


*  Precomputed  feature  vectors  were  used  during  training  only.  TIMIT  PCM  speech  files 
were  used  during  word  recognition  tests. 
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consecutive  time  step,  only  the  entry  with  the  highest  output  value  (the  output  peak)  is 
retained. 

•  The  remaining  output  peaks  are  then  processed  sequentially  and  in  chronological 
order.  Each  diphone  peak  is  assumed  to  potentially  be  the  start  of  a  word. 

•  For  each  word  in  the  phonetic  dictionary  that  begins  with  the  current  diphone,  the 
lexical  analyzer  scans  forward  in  the  diphone  stream,  looking  for  diphones  containing 
the  remaining  phonemes  of  that  word.  The  search  is  bounded  by  maximum  and 
minimum  expected  durations  for  the  phoneme.  In  other  words,  any  diphones 
containing  the  required  phoneme  must  be  tagged  with  times  that  make  sense  in  view  of 
the  phoneme's  expected  duration.  Otherwise,  they  are  ignored. 

•  A  word  is  written  to  the  lexical  analyzer's  output  if  all  of  its  phonemes  appear  in  the 
diphone  stream  (and  with  reasonable  time  stamps).  A  confidence  value  (a  simple 
average  of  the  diphone  recognizor  outputs  for  the  diphones  in  the  word)  and  time 
stamp  are  also  written  to  the  output. 

•  Thresholds  are  used  to  prevent  weak  diphone  recognizor  outputs  from  triggering  this 
search  process.  Similarly,  a  threshold  is  used  to  prevent  the  output  of  words  with  low 
overall  confidence  values. 

•  A  second  phonetic  dictionary  is  maintained,  sorted  by  the  second  phoneme  in  each 
word.  This  allows  the  lexical  analyzer  to  spot  words  in  the  diphone  stream  even  if  the 
first  two  diphones  are  lost  (taking  the  word's  first  phoneme  with  them). 

Further  details  of  this  algorithm  can  be  found  in  Chapter  IV  and  Appendix  C  (primarily  in 
RECOGNIZ.CL). 

I.  TRAINING  AND  TEST  DATA 

Early  speech  recognition  researchers  were  forced  to  build  their  own  speech  databases. 
Thus,  a  significant  amount  of  time  and  money  had  to  be  invested  in  gathering  training  and  test 
data  before  any  useful  research  could  be  conducted.  In  an  effort  to  solve  this  problem,  DARPA 
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launched  a  large  speech  database  development  program  in  1984  as  part  of  its  Strategic 
Computing  speech  program  [Ref.  24].  One  of  the  results  of  this  effort  was  the  TIMIT  speech 
database,  developed  jointly  by  researchers  at  Texas  Instruments  (TI),  the  Massachusetts  Institute 
of  Technology  (MIT),  and  SRI  International. 

The  TIMIT  database  is  currently  distributed  on  CD-ROM  through  the  National  Institute  of 
Standards  and  Technology  (NIST).  The  database  consists  of  a  total  of  6,300  sentences  from  630 
speakers,  totaling  over  5  hours  of  speech.  The  sentences  are  already  partitioned  into 
recommended  training  and  test  sets,  with  27  percent  of  the  speakers  assigned  to  the  test  set.  Test 
set  speakers  do  not  also  appear  in  the  training  set.  Both  the  test  set  and  the  training  set  are 
partitioned  into  eight  dialect  regions  (New  England,  Northern,  North  Midland,  South  Midland, 
Southern,  New  York  City,  Western,  and  "Army  Brat"  [moved  around]).  Each  dialect  region  is 
further  partitioned  by  individual  speaker,  with  ten  sentences  per  speaker.  Of  the  total,  70  percent 
of  the  speakers  are  male  and  30  percent  female.  [Ref.  2] 

To  support  the  variety  of  research  being  conducted,  three  types  of  sentences  were  included 
in  the  TIMIT  database.  The  dialect  sentences  (prefixed  by  SA  in  the  database)  were  designed  to 
expose  dialectical  variations  between  speakers.  Only  two  such  sentences  are  included  in  the 
database  and  each  speaker  recorded  both.  The  phonetically  diverse  (SI)  sentences  were  randomly 
selected  from  a  variety  of  sources  to  generate  diversity  in  sentence  types  and  phonetic  contexts. 
Each  speaker  read  three  SI  sentences  and  no  two  speakers  read  the  same  sentence.  Finally,  the 
phonetically-compact  (SX)  sentences  were  hand-designed  at  MIT  to  be  compact  and  provide 
good  phonetic  coverage.  Each  speaker  read  five  SX  sentences.  [Ref.  2] 

Because  of  their  compactness  and  comprehensive  phonetic  coverage,  the  TIMIT  SX 
sentences  were  selected  for  use  in  this  research.  Only  sentences  from  the  TIMIT  training  set 
were  used  for  training  (weight  adjustments)  and  only  test  set  sentences  were  used  for  testing. 

Before  the  neural  network  feature  vectors  could  be  computed,  it  was  necessary  to  identify 
the  specific  diphones  and  sentences  to  be  used.  A  number  of  automated  tools  were  developed  to 
assist  with  this  task.  The  ADA  source  code  for  these  utilities  can  be  found  in  Appendix  B.  Four 
speech  data  sets  were  ultimately  used  for  neural  network  training.  The  first  two  sets,  built  around 
19  and  44  diphones  respectively,  were  used  primarily  for  software  testing  and  experimentation 
with  various  training  algorithms.  These  two  data  sets  will  not  be  discussed  in  depth.  The  other 


60 


two  speech  data  sets  included  samples  of  the  157  most  common  diphones  in  the  SX  sentences. 
These  two  data  sets  are  described  in  more  detail  later  in  this  section.  Details  on  the  mechanics  of 
building  the  speech  data  sets  can  be  found  in  the  source  code  comments  of  Appendix  B.  For 
now,  the  following  facts  should  suffice: 

•  A  utility  (DIPNUMS.ADA)  was  used  to  sift  through  the  TIMIT  phonetic  transcriptions 
and  determine  how  many  and  which  diphones  were  available. 

•  It  was  assumed  (based  on  intuition)  that  at  least  thirty  occurrences  of  each  diphone 
would  be  needed  to  train  the  neural  networks  to  a  reasonable  degree  of  generalization. 
It  was  also  assumed  that  100  occurrences  would  be  better. 

•  Only  157  diphones  were  common  enough  to  occur  at  least  100  times  in  the  TIMIT  SX 
training  set.  These  157  diphones  were  therefore  used  to  produce  the  two  speech  data 
sets  described  in  the  following  subsections.  The  diphones  themselves  are  listed  in 
Appendix  D. 

1.  The  Medium-Sized  Data  Set 

It  would  not  have  been  wise  to  use  an  overly  large  data  set  for  early  neural  network  training 
attempts.  A  great  deal  of  trial  and  error  was  needed  to  determine  the  number  of  hidden  nodes 
required.  It  was  also  frequently  necessary  to  refine  the  training  methodology  and  determine  the 
effect  of  a  number  of  parameters  described  in  Chapter  IV.  Since  a  common  UNIX  workstation 
requires  approximately  eight  hours  for  a  single  pass  through  the  large  data  set  (described  later),  it 
was  necessary  to  do  much  of  the  experimentation  with  a  more  modest  data  set.  The  medium 
training  set  consisted  of  30  examples  of  each  of  the  157  diphones  mentioned  earlier  (4710  files). 
Because  some  diphones  are  more  common  in  the  TIMIT  training  set  than  in  the  test  set,  most  but 
not  all  of  the  157  diphones  occur  10  times  in  the  medium  test  set  (1568  files).  The  TIMIT  file 
listings  were  "shuffled"  before  selection  of  all  data  sets,  in  an  effort  to  achieve  a  more  uniform 
gender  and  dialectical  distribution  (the  selection  procedure  is  described  in  Appendix  B).  Table 
3.1  lists  the  demographics  of  the  speakers  included  in  the  medium  training  and  test  sets. 
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Table  3.1.  Description  of  Medium  Data  Set 


2.  The  Large  Data  Set 

The  large  training  set  consisted  of  100  examples  of  each  of  the  157  diphones  in  Appendix 
D  (15,700  files).  Again,  some  diphones  are  more  common  in  the  TIMIT  training  set  than  in  the 
test  set.  So  most  but  not  all  of  the  157  diphones  occur  30  times  in  the  large  test  set  (4490  files). 
Table  3-2  lists  the  demographics  of  the  speakers  included  in  the  large  training  and  test  sets. 


CATEGORY 

TRAINING  SET 

TEST  SET 

TOTAL 

MALE  SPEAKERS  /  SAMPLES 

326/11106 

110/3945 

436/15051 

FEMALE  SPEAKERS  /  SAMPLES 

136/4594 

53/545 

189/5139 

DIALECT  REGION  1  SAMPLES 

1,315 

294 

1,609 

DIALECT  REGION  2  SAMPLES 

2,702 

686 

3,388 

DIALECT  REGION  3  SAMPLES 

2,597 

691 

3,288 

DIALECT  REGION  4  SAMPLES 

2,270 

727 

2,997 

DIALECT  REGION  5  SAMPLES 

2,293 

808 

3,101 

DIALECT  REGION  6  SAMPLES 

1,112 

291 

1,403 

DIALECT  REGION  7  SAMPLES 

2,629 

660 

^  3,289 

DIALECT  REGION  8  SAMPLES 

782 

333 

1,115 

TOTAL  DIPHONE  SAMPLES 

15,700 

4,490 

20,190 

— 

Table  3.2.  Description  of  Large  Data  Set 
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J.  TRAINING  METHODOLOGY 


The  MATLAB  Neural  Network  Toolbox  and  MATLAB  script  files  were  used  to  simulate 
and  train  the  neural  networks  used  in  this  research.  Appendix  A  contains  the  script  files 
specifically  written  for  this  purpose.  The  main  training  script  (TRNBPX.M)  operates  as  follows: 

•  At  startup,  the  script  file  reads  a  file  containing  a  number  of  training  parameters,  such 
as  the  number  of  hidden  nodes  and  the  recognition  window  width.  It  also  loads  a  list 
of  diphones  defining  the  output  classes.  Finally,  it  opens  two  text  files  listing  the  files 
containing  the  precomputed  diphone  feature  vectors. 

•  After  one  complete  pass  through  the  training  set,  the  new  weights  are  tested  using  the 
entire  test  set.  Statistics  for  both  training  and  test  sets  are  then  written  to  a  text  file.  If 
the  test  results  indicate  that  the  test  set  errors  have  reached  a  new  miniTnnm,  the 
weights  are  saved  to  a  file.  If,  on  the  other  hand,  the  new  weights  actually  increase 
training  set  SSE  by  a  predetermined  amount  (set  by  the  "er"  parameter),  the  weight 
changes  are  discarded  and  the  epoch  is  repeated  with  a  lower  learning  rate. 

•  The  current  state,  including  the  latest  weights,  is  written  to  a  file  to  allow  resumption 
of  training  even  if  the  job  terminates  unexpectedly  (with  up  to  several  weeks  of  CPU 
time  invested  in  the  network's  weights,  it  would  not  be  wise  to  tempt  fate). 

•  The  entire  process  (an  epoch)  is  repeated  until  the  error  rate  becomes  satisfactory  or 
the  job  is  terminated. 

It  is,  of  course,  quite  expensive,  in  terms  of  CPU  time,  to  run  through  the  entire  test  set 
during  every  epoch.  But  test  set  errors  can  exhibit  sharp  drops  and  spikes  from  one  epoch  to  the 
next.  And  training  set  errors  are  a  poor  short  term  indicator  of  test  set  errors.  In  fact,  a  drop  in 
training  set  errors  is  often  accompanied  by  at  least  a  temporary  rise  in  test  set  errors.  Also,  a 
neural  network's  ability  to  generalize  (correctly  categorize  never-before-seen  data)  will  suffer  if 
the  network  is  overtrained.  The  usual  intuitive  explanation  is  that  the  network  leams  too  many 
details  about  the  test  set,  details  that  do  not  contribute  to  correct  classification  of  unfamiliar  data 
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Thus,  a  pass  through  the  test  set  after  every  epoch  is  the  best  way  to  ensure  that  the  best  weights 
will  be  found. 

K.  SUMMARY 

Chapter  IE  has  outlined  the  plan  under  which  this  research  was  conducted.  The  next 
chapter  describes  the  results  of  carrying  out  that  plan. 
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IV.  RESULTS 


A.  INTRODUCTION 

Not  surprisingly,  some  details  of  the  methodology  evolved  as  results  accumulated  and  were 
analyzed.  However,  this  research  largely  supports  the  assumptions  and  expectations  of  Chapter 
in.  The  following  sections  describe  the  key  results.  In  particular,  frequency  normalization 
results,  diphone  and  word  recognition  statistics,  and  the  impact  of  feedback-based  time  alignment 
are  described.  Emphasis  is  also  given  to  reporting  the  apparent  impact  of  various  neural  network 
training  alternatives. 

B.  FREQUENCY  NORMALIZATION 

As  mentioned  in  Chapters  H  and  HI,  both  pitch  and  formant  frequencies  vary  inversely  with 
the  speaker's  vocal  tract  length.  Pitch  can  be  detected  fairly  easily  in  a  voiced  speech  signal*.  So 
a  study  was  conducted  to  determine  if  pitch  would  be  a  useful  basis  for  formant  frequency 
normalization.  If  so,  speaker  independent  recognition  accuracy  could  be  expected  to  improve. 

The  lY  vowel  was  selected  as  the  subject  of  this  study.  As  indicated  in  Figure  2.5,  this 
vowel  is  characterized  by  good  separation  between  its  formant  frequencies.  After  cepstral 
smoothing,  the  formant  frequencies  were  visually  identified  and  recorded  along  with  the 
speaker's  pitch.  A  total  of  209  data  points  were  taken  from  70  speakers,  57  samples  from  21 
females  and  152  samples  from  49  males.  All  samples  were  taken  from  SX  sentences  and  Dialect 
Region  2.  The  following  subparagraphs  describe  the  data  and  the  normalization  method  based  on 
that  data. 

1.  Distribution  of  Formant  Frequencies 

Figures  4.1, 4.2,  and  4.3  show  scatter  plots  for  FI,  F2,  and  F3  respectively.  It  appears  from 
the  plots  that  pitch  frequency  and  formant  frequency  are  related.  Table  4.1  lists  key  statistics  for 
formant  frequencies  of  the  three  data  sets.  The  raw  data  is  listed  in  Appendix  H. 


*  Cepstral  pitch  detection  was  used  in  this  research,  backed  up  by  the  Average  Magnitude 
Difference  Function  [Ref.  1].  The  code  in  Appendix  A  can  be  consulted  for  specifics  on  the 
technique  used. 
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Figure  4.1 .  Plot  of  FI  frequency  versus  pitch  for  209  samples  of  the  lY  vowel  (70  speakers). 


PITCH/FORMANT 

REGRESSION 

REGRESS. 

SAMPLE 

SAMPLE 

CORRELATION 

COEFFICIENT  OF 

X 

FORMANT 

MEAN  (Hz) 

VARIANCE 

COEFFICIENT  (r) 

DETERMINATION  (i^) 

COEFFIC. 

F1 

362 

7,904 

0.35 

0.13 

0.08 

F2 

2,125 

80,262 

0.62 

0.38 

4.48 

F3 

2,729 

81,669 

0.58 

0.34 

4.25 

Table  4, 1 .  Statistics  from  209  Measurements  of  lY  Formant  Frequencies 


The  correlation  coefficients,  and  the  plots,  indicate  a  moderate  correlation  between  pitch 
and  formant  frequencies,  particularly  F2  and  F3.  The  correlation  between  pitch  and  FI  is 
weaker.  Furthermore,  the  weak  regression  coefficients  of  determination  (i^)  indicate  that  the 
relationships  are  nonlinear.  This  is  consistent  with  previous  research  on  the  subject  [Ref.  3]. 
Finally,  the  slopes  of  the  regression  lines  are  steeper  for  F2  and  F3  than  for  FI,  This  rules  out  a 
simple  shift  of  the  frequency  axis  proportional  to  pitch  changes.  Such  a  shift  would  overcorrect 
FI  or  undercorrect  F2  and  F3. 

2.  Normalization  Method  Used 

Because  the  neural  network  must  recognize  formant  peaks  in  the  overall  spectrum,  it  was 
necessary  to  use  a  normalization  method  that  did  not  depend  on  detection  of  the  formants 
themselves.  Instead,  each  frequency  component  of  the  FFT  output  was  shifted  by  an  amount 
proportional  to  both  the  frequency  being  shifted  and  the  difference  between  that  speakers  pitch 
and  a  "standard"  pitch  (100  Hz).  Thus,  the  expected  "error"  in  a  given  frequency  component  was 
modeled  by: 

formant  ~  k  f measured  pitch  (4  - 1 ) 

where  k  is  a  constant  of  proportionality.  If  the  speakers  measured  pitch  is  100  Hz,  the  right  side 
of  the  equation  would  equal  zero  and  no  correction  would  be  made.  For  any  other  pitch,  the 
measured  frequency  would  be  corrected  by  adding  the  left  side  of  Equation  4.1  to  the  measured 
frequency.  The  constant  of  proportionality  was  estimated  by  solving  for  k  as  follows: 


formant  !  pitch  k  fmeasured 
(Afformant  f  Afpimh)  /  fmeasured  ~  k 


(4.2) 

(4.3) 
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The  left  side  of  Equation  4.2  can  be  approximated  by  the  slopes  of  the  regression  lines  in 
Table  4.1.  A  value  for  is  obtained  for  each  formant  by  plugging  a  pitch  of  200  Hz  into 

the  equation  for  that  formant's  regression  line.  The  three  resulting  equations  form  the  divisors 
and  the  regression  slopes  are  the  dividends  in  Equations  4.4  through  4.6.  For  the  first  formant: 

F1:  0.805  7(0.805x200  +  251.4)  =  0.00195  (4.4) 

where  251.4  is  the  regression  line's  intercept  (Table  4.1).  For  F2: 

F2  :  4.48  /  (4.48x200  +  1508)  =  0.00186  (4.5) 

where  1508  is  the  intercept.  Finally,  for  F3: 

F3  :  4.25  7(4.25x200  +  2143)  =  0.00142  (4.6) 

where  2143  is  the  intercept. 

Because  the  F2  regression  line  best  fit  the  data,  the  value  from  Equation  4.5  was  given 
more  weight  and  0.00185  was  the  proportionality  constant  actually  used  to  perform 
normalization.  Equation  4.1  yields  an  estimate  of  the  difference  between  measured  frequency 
and  normalized  frequency.  Thus,  the  frequency  components  from  the  FFT  can  be  normalized  by: 


f normalized  ~  f measured  +  0.00185  f  measured  (100  -  f pitch  ) 


(4.7) 


The  right  addend  in  Equation  4.7  is  simply  Equation  4.1  with  the  proportionality  constant 

from  Equations  4.4  through  4.6.  The  parenthesized  quantity  is  the  difference  between  the 
speaker's  pitch  and  the  "standard"  pitch  of  100  Hz  (Afpitch  in  earlier  equations).  In  words,  the 
measured  frequency  is  corrected  by  an  amount  proportional  to  the  measured  frequency  and  the 

difference  between  the  speaker's  pitch  and  a  standard  pitch  (again,  100  Hz  was  used). 

3.  Impact  of  Normalization 

Figures  4.4  and  4.5  show  the  distribution  of  F2  frequencies  for  the  same  209  F2  data 
points,  before  and  after  normalization  using  Equation  4.7.  While  considerable  variance  remains, 
the  distribution  after  normalization  is  clearly  tighter.  As  shown  by  Table  4.2,  normalization  also 
reduces  the  variance  of  FI  and  F3  frequencies  for  the  same  data. 
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Figure  4.4.  Distribution  of  F2  frequencies  (lY  vowel)  before  normalization. 


Figure  4.5.  Distribution  of  F2  frequencies  after  pitch-based  normalization. 


FORMANT 

VARIANCE  BEFORE 
NORMALIZATION 

VARIANCE  AFTER 
NORMALIZATION 

F1 

7,904.5 

5,820 

F2 

80,261.7 

42,111.8 

F3 

81,669.1 

49,325.2 

Table  4.2.  Variance  of  Formant  Frequencies  for  lY  Vowel  Before  and  After  Frequency  Normalization 

No  attempt  was  made  to  collect  data  on  vowels  other  than  lY.  It  was  assumed  that  all 
vowels  would  exhibit  similar  relationships  between  pitch  and  formant  frequencies.  Instead, 
feature  vectors  were  produced  both  with  and  without  frequency  normalization.  Identical  neural 
networks  were  then  trained  on  the  same  speech  samples  from  the  same  speakers,  the  only 
difference  being  the  use  or  absence  of  normalization.  In  every  case  the  neural  network  trained 
with  normalized  frequencies  outperformed  the  network  trained  without  normalization.  Figure 
4.6  shows  the  learning  curves  from  the  largest  such  comparison.  The  solid  line  shows  the  test  set 
error  rate  for  the  network  trained  with  normalized  data.  The  dashed  line  shows  the  test  set  errors 
for  an  identical  network  trained  without  normalization.  In  this  case,  the  network  without 
normalization  suffered  over  34%  more  errors  (using  the  "top  three"  error  metric,  explained  in  the 
next  section). 


The  relationship  modeled  in  Equations  4.1  and  4.7  may  have  very  little  to  do  with  the  true 
relationship  between  pitch  and  formant  frequency.  But,  judging  from  the  data  gathered  so  far, 
the  frequency  normalization  method  described  in  this  section  is  at  least  useful. 

C.  DIPHONE  RECOGNITION  RESULTS 

1.  Error  Metrics 

Two  metrics  were  used  to  evaluate  the  accuracy  of  all  neural  networks  trained  in 
conjunction  with  this  research: 

a.  Percent  Correct 

This  metric  represents  the  percentage  of  test  set  diphones  which  triggered  the  correct 
neural  network  output.  In  other  words,  if  the  correct  output  neuron  had  the  strongest  output,  the 
diphone  was  correctly  recognized.  Test  set  diphones  (and  speakers)  were  never  used  to  train 
(adjust  weights  of)  the  networks.  So  these  figures  should  give  a  much  more  accurate  idea  of  a 
network's  generalization  ability  than  the  training  set  figures.  Therefore,  all  comments  and 
accuracy  figures  refer  to  the  test  set  unless  specified  otherwise. 

b.  Percent  in  Top  Three 

This  metric  is  the  percentage  of  diphones  which  activated  the  correct  output  neuron 
strongly  enough  to  place  it  among  the  top  three  neuron  outputs.  This  figure  is  useful  for  two 
reasons: 

•  The  lexical  analyzer  (described  in  Chapter  lH)  is  only  shown  the  network's  top  three 
outputs  for  each  recognition  window  position.  Thus,  the  diphone  must  be  in  the  top 
three  to  be  useful  at  the  next  stage  of  recognition.  Of  course,  the  lexical  analyzer 
could  be  redesigned  to  scan  the  top  four  or  five  neural  network  outputs.  But  that 
would  increase  its  own  search  space  considerably.  Thus,  the  top  three  figure  is  a 
tradeoff  between  the  diphone  recognizor  and  the  lexical  analyzer.  If  the  diphone 
recognizor  top  three  percentage  is  too  low  some  of  the  burden  must  be  shifted  to  the 
lexical  analyzer. 
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•  The  top  three  metric  is  commonly  used  in  speech  recognition  research.  So  its  use  here 
will  facilitate  comparisons  between  diphone  and  phoneme  recognition  results. 

Although  sum-squared  error  (SSE)  was  recorded  for  every  epoch  of  each  network,  it  is 
not  often  quoted  in  this  thesis.  In  many  cases,  sum-squared  error  drops  were  accompanied  by 
increases  in  test  set  errors. 

2.  Results 

The  first  entry  in  Table  4.3  describes  the  most  accurate  neural  network  of  those  trained  on 
the  large  and  medium-sized  data  sets  (157  diphones).  In  short,  this  network  correctly  recognized 
70.2%  of  the  test  diphones.  For  the  same  network,  89.0%  of  the  correct  diphones  were  among 
the  top  three  outputs. 

Table  4.3  also  lists  key  statistics  from  other  selected  neural  networks  trained  with  the 
medium-sized  and  large  data  sets  described  in  Chapter  in.  The  networks  included  in  this  table 
were  chosen  because  they  seemed  to  illustrate  the  impact  of  various  training  alternatives.  For 
example,  the  apparent  impact  of  frequency  normalization  can  be  seen  by  comparing  the  third  and 
fourth  networks  in  the  table.  The  third  network  was  trained  without  frequency  normalization. 
The  fourth  network  was  trained  with  normalization.  The  following  comments  also  apply  to  the 
table: 


•  The  numbers  in  the  first  column  of  the  table  are  included  only  to  simplify  references  in 
the  text  to  a  particular  network.  They  have  nothing  to  do  with  a  ranking  of  the  results 
or  the  sequence  in  which  the  networks  were  trained. 

•  The  training  set  accuracy  figures  are  for  the  same  epoch  as  the  test  set  figures.  It 
proved  possible  to  achieve  training  set  accuracy  of  100%  (or  very  nearly  100%)  on  all 
of  the  data  sets.  But  doing  so  requires  overtraining,  at  the  expense  of  test  set  accuracy. 

•  Training  was  stopped  when  test  set  accuracy  no  longer  appeared  to  be  improving.  It  is 
sometimes  difficult  to  judge  when  progress  has  truly  ended.  So  it  is  theoretically 
possible  that  some  networks  were  stopped  too  soon. 
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Table  4.3.  Key  Statistics  from  Selected  Neural  Network  Diphone  Recognizors 
3.  Caveat  on  Neural  Network  Comparisons 

In  the  discussions  that  follow,  every  effort  has  been  made  to  compare  results  between 
networks  that  are  identical  in  every  respect  except  the  parameter  at  issue.  For  example,  the  third 
and  fourth  networks  in  the  table  were  initialized  with  the  same  random  weights  and  they  were 
trained  on  the  same  speech  samples  from  the  same  speakers.  They  had  the  same  number  of 
hidden  neurons  and  both  networks  were  trained  without  feedback-based  time  alignment.  The 
only  significant  difference  was  in  the  use  of  frequency  normalization  for  the  third  network. 
However,  it  is  wise  to  remember  that  two  identical  networks  can  achieve  very  different  results 
simply  by  starting  at  different  points  on  the  error  surface.  Networks  6  and  7  in  Table  4.3 
illustrate  this  point.  The  only  difference  between  these  two  networks  is  in  the  random  seed  used 
to  initialize  MATLAB’s  pseudo-random  number  generator  prior  to  initialization  of  the  weights. 
Network  6’s  random  seed  was  set  to  zero  and  Network  7's  was  set  to  five.  Judging  from  the 
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learning  curves,  Network  7  fell  into  an  inferior  local  minimum  and  never  achieved  the  accuracy 
of  Network  6. 

To  eliminate  this  potentially  confounding  variable,  and  to  make  any  particular  network's 
course  of  training  repeatable,  the  networks  trained  in  this  research  all  started  with  random  seeds 
of  zero  unless  specified  otherwise.  But  this  is  still  no  guarantee  that  a  particular  comparison  will 
reliably  indicate  the  impact  of  a  given  parameter.  Even  networks  with  the  same  initial  weights 
can  fall  into  different  local  minima  if  something  causes  them  to  take  different  paths  down  the 
error  surface.  In  one  case,  a  network  that  had  been  training  for  several  weeks  was  interrupted  by 
the  failure  of  the  host’s  hard  drive.  Training  was  resumed  on  another  host  using  a  backup  copy  of 
some  earlier  weights.  But  the  second  host  never  achieved  the  accuracy  level  of  the  first.  The 
second  network  took  a  different  path  down  the  error  surface  because  the  training  algorithm  uses 
an  adaptive  learning  rate  and  a  momentum  term.  When  training  was  interrupted  the  learning  rate 
dropped  back  to  the  default  and  the  momentum  term  was  zeroed  out  for  the  first  epoch.  This 
caused  the  second  network  to  deviate  from  the  path  taken  by  the  first.  Unfortunately,  the  second 
path  was  inferior.  It  would  be  possible  to  reproduce  the  results  of  the  first  run,  given  some 
patches  to  the  training  code  and  a  few  more  weeks  of  training  time.  But  some  other  network 
might  just  as  easily  benefit  from  an  unplanned  interruption. 

Although  the  preceding  discussion  suggests  that  comparisons  between  neural  network  runs 
are  risky,  trends  are  difficult  to  ignore.  None  of  the  conclusions  described  in  the  following 
subsections  are  based  on  a  single  comparison.  For  example.  Networks  5  and  8  show  essentially 
the  same  frequency  normalization  advantage  as  do  Networks  3  and  4.  And  early  test  runs  on 
smaller  data  sets  support  the  same  conclusions.  Similarly,  the  advantages  of  feedback  based  time 
alignment  were  apparent  in  dozens  of  training  rans.  Again,  the  comparisons  discussed  in  the  text 
are  based  on  careful  matching  of  training  parameters,  including  factors  such  as  the  momentum 
term  and  error  ratio  used  by  MATLAB's  training  algorithms. 

The  impact  of  frequency  normalization  has  already  been  discussed.  The  following  subsections 
describe  the  apparent  impact  of  a  number  of  other  key  training  alternatives. 
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4.  Number  of  Output  Classes 

Although  all  of  the  entries  in  Table  4.3  describe  neural  networks  trained  on  157  diphones,  a 
number  of  smaller  networks  were  trained  to  recognize  smaller  diphone  sets.  Table  4.4  shows  the 
highest  accuracy  rates  achieved  for  the  three  diphone  set  sizes.  The  Small  IH  data  set  training  set 
consisted  of  30  examples  of  each  of  19  diphones,  all  starting  with  the  vowel  IH.  The  Small  IH-T 
data  set  consisted  of  30  examples  of  each  of  44  diphones,  all  starting  with  the  IH  vowel  or  the  T 
consonant  (thus,  it  is  a  superset  of  the  Small  IH  data  set).  The  medium  data  set  was  described  in 
Chapter  HI.  In  hindsight,  some  of  the  training  techniques  used  on  the  smaller  networks  were 
inferior.  Yet  the  smaller  networks  achieved  noticeably  higher  accuracy  levels.  This  issue  is 
revisited  in  Chapter  VI  and  the  next  paragraph's  discussion  of  input  batching. 


DATA  SET 

NUMBER 

DIPHONES 

TRAIN  /  TEST 
SAMPLES 

TRAINING 

CORRECT 

TRAINING 
TOP  3 

TEST 

CORRECT 

TEST 
TOP  3 

SMALL  (IH) 

19 

570/181 

100% 

100% 

84.0% 

100% 

44 

1320/397 

100% 

100% 

71.5% 

95.7% 

MEDIUM 

157 

4710/1568 

99.1% 

99.2% 

62.5% 

83.5% 

Table  4.4.  Comparison  of  Accuracy  Levels  for  Neural  Networks  with  Varying  Numbers  of  Output  Classes 


5.  Size  of  Training  Set 

As  expected,  networks  trained  on  the  large  data  set  performed  better  on  their  test  sets  than 
networks  trained  with  the  medium  data  set.  Network  1  in  Table  4.3,  for  example,  was  trained 
using  100  samples  of  each  diphone  (15,700  input  vectors).  Network  19,  identical  in  all  other 
respects,  was  trained  using  just  30  samples  of  each  diphone  (4,710  input  vectors).  The  larger 
training  set  clearly  improved  Network  I's  generalization  ability. 

6.  Batching  of  Weight  Changes 

Weight  changes  from  each  test  set  input  vector  are  often  accumulated  until  the  end  of  an 
epoch  and  then  all  applied  at  once.  This  is  commonly  known  as  input  or  weight  change 
batching.  This  method  is  the  default  in  MATLAB's  Neural  Network  Toolbox.  It  was  also  used 
successfully  in  training  a  number  of  networks  to  recognize  the  two  smaller  diphone  sets.  But 
dozens  of  networks  failed  to  converge  during  batched  input  training  on  the  medium  data  set. 
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Training  was  attempted  with  hidden  layer  sizes  ranging  from  64  to  512  neurons.  Figure  4.7 
shows  the  SSE  plot  {learning  curve)  for  one  such  network  with  256  hidden  neurons.  Note  the 
abrupt  leveling  off  of  the  curve  at  about  the  tenth  epoch.  This  is  typical  of  the  neural  networks 
that  became  stuck  in  local  minima.  For  comparison  purposes,  the  dashed  line  in  the  same  figure 
shows  the  SSE  plot  for  a  later  network  which  converged  to  a  much  lower  minimum.  The 
principal  difference  in  the  second  network  is  the  absence  of  weight  change  batching.  In  the 
second  network,  and  all  of  the  networks  successfully  trained  on  the  medium-sized  and  large  data 
sets,  weight  changes  were  made  immediately  after  each  input  vector. 


Figure  4.7.  Error  plots  for  identical  networks,  with  and  without  batching  of  weight  changes.  The 
network  with  batched  weight  changes  (solid  line)  stopped  learning  at  approximately  epoch  10. 

Weight  change  batching  is  discussed  in  [Ref.  13].  The  crux  of  that  discussion  is  that 
batching  is  useful  in  some  situations.  But  it  depends  on  the  nature  of  the  problem.  In  other 
situations,  input  batching  can  render  a  network  more  prone  to  becoming  stuck  in  local  minima. 
Network  10  in  Table  4.3  also  illustrates  this  point.  This  network  correctly  identified  only  68  of 
the  157  diphones  in  its  initial  training  set  (just  one  example  of  each  diphone).  Apparently, 
weight  change  batching  is  of  questionable  value  in  diphone  recognition  systems. 
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7.  Hidden  layer  sizes 

The  majority  of  the  networks  trained  on  the  large  and  medium-sized  data  sets  included  256 
neurons  in  the  hidden  layer.  This  produced  the  lowest  error  rate  of  the  networks  tested  without 
input  batching.  Figure  4.8  shows  test  set  performance  for  the  hidden  layer  sizes  which  performed 
well  enough  initially  to  justify  training  on  the  full  training  set.  It  may  be  that  a  smaller  number  of 
hidden  neurons  would  suffice.  But  there  would  probably  be  no  advantage  to  pruning  the  hidden 
layer  other  than  reducing  the  CPU's  floating  point  burden.  Research  by  Sietsma  and  Dow 
[Ref.  28]  suggests  that,  contrary  to  neural  network  "folklore,"  neural  networks  with  smaller 
hidden  layers  do  not  generalize  better  than  those  with  oversized  hidden  layers.  Networks  often 
generalize  poorly  because  of  overtraining.  But  if  training  is  stopped  when  test  set  errors  reach  a 
minimum,  and  training  is  conducted  with  "noisy"  data,  there  is  no  accuracy  penalty  for  oversized 
hidden  layers.  These  conclusions  are  consistent  with  informal  observations  on  the  issue  during 
this  research. 


Figure  4.8.  Medium  test  set  errors  versus  hidden  layer  size  (with  batched  weight  changes).  Most 
networks  successfully  trained  on  the  medium  and  large  data  sets  employed  256  hidden  neurons  (the  error 
minimum  in  the  above  plot).  However,  more  data  points  would  reveal  that  the  above  relationship  is  not 
smooth  (peaks  and  valleys  exist  between  the  points  plotted). 
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8.  Randomized  Sequencing 

When  inputs  are  batched,  input  vectors  may  be  presented  in  any  sequence  during  a  training 
epoch.  Since  all  weight  changes  are  deferred  until  the  end  of  the  epoch,  their  ordering  is 
irrelevant.  That  is  not  the  case  when  input  batching  is  disabled.  Without  input  batching,  weight 
changes  for  a  given  input  vector  are  determined  after  applying  weight  changes  for  the  preceding 
input  vectors.  So  changes  in  the  input  sequence  can  lead  to  entirely  new  paths  down  the  error 
surface.  Therefore,  randomized  sequencing  of  inputs  is  often  recommended  [Ref.  13].  When 
training  on  just  one  example  of  each  of  157  diphones,  randomized  sequencing  did  indeed  appear 
to  speed  up  network  convergence.  However,  when  training  with  the  medium  data  set  (30 
examples  of  each  of  157  diphones),  there  was  no  apparent  advantage  to  "shuffling"  the  input 
vectors  before  each  epoch.  For  example.  Network  11  in  Table  4.3  was  trained  by  shuffling  the 
input  vectors  before  every  epoch.  The  input  vectors  for  Network  12,  on  the  other  hand,  were 
shuffled  just  once,  prior  to  the  first  epoch.  The  input  sequence  was  identical  for  the  remaining 
epochs.  The  results  for  these  two  networks  (and  similar  comparisons)  imply  there  is  no 
significant  benefit  to  frequent  re-sequencing  of  the  inputs.  Therefore,  the  added  algorithmic 
complexity  and  run  time  appear  to  be  unjustified. 

9.  Feature  Vector  Length 

As  discussed  in  Chapter  HI,  the  recognition  window  was  initially  23  (overlapped)  sample 
windows  or  502  milliseconds  wide.  This  design  worked  well  enough  during  initial  testing  with 
19  diphones  (dominated  by  vowels).  But  the  error  rate  increased  considerably  during  testing  with 
a  data  set  of  44  diphones  (and  a  larger  proportion  of  stop  consonants  and  consonant  pairs).  The 
error  statistics  implied  that  the  shorter  diphones  were  suffering  higher  error  rates  than  the  longer 
ones.  So  tests  were  run  with  these  44  diphones  and  recognition  window  widths  of  7,  9,  11,  13, 
15,  17,  19,  and  23  sample  windows.  Error  rates  were  consistently  lowest  when  the  input  vector 
was  13  sample  windows  wide  (289  milliseconds).  Also,  comparisons  were  later  run  with  the 
medium  data  set  and  input  vector  widths  of  11,  13,  and  15  sample  windows.  The  results  can  be 
seen  in  table  4.3  (Networks  13,  14,  and  15  respectively).  Although  the  difference  between  11 
and  13  sample  windows  is  less  apparent  with  the  larger  data  set,  both  are  clearly  better  than  the 
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15  sample  window  version.  Therefore,  a  recognition  window  width  of  about  289  milliseconds 
appears  to  be  a  reasonable  compromise,  at  least  for  the  157  diphones  used  in  this  research. 

10.  Training  Phases 

Some  early  networks  were  trained  on  the  entire  training  set  from  the  first  epoch.  This 
method  had  several  disadvantages: 

•  A  trial  and  error  approach  had  to  be  used  to  determine  the  number  of  neurons  required 
in  the  hidden  layer.  Much  CPU  time  was  wasted  in  training  networks  for  hundreds  of 
epochs  on  thousands  of  files,  only  to  discover  that  the  networks  needed  more  hidden 
neurons. 

•  Some  networks  failed  to  converge,  even  though  later  trials  demonstrated  that  the 
networks  had  enough  hidden  nodes. 

Both  of  these  problems  were  addressed  by  training  later  networks  with  a  single  (arbitrarily 
selected)  example  of  each  diphone  initially.  After  the  networks  performed  well  on  the  initial  set, 
training  was  resumed  with  the  complete  training  set.  With  only  157  files  in  the  initial  set,  epochs 
accumulated  quickly.  If  the  network  failed  to  learn  the  initial  set,  more  hidden  nodes  could  be 
added  and  another  attempt  made  with  little  time  lost.  This  technique  also  seemed  to  ultimately 
result  in  more  accurate  networks.  Networks  16  and  17  in  Table  4.3  are  examples.  However,  the 
difference  was  not  dramatic  and  it  was  much  less  apparent  for  some  comparisons  than  others. 
Also,  training  for  too  long  on  the  initial  set  actually  reduced  final  network  performance. 
Network  9  in  table  4.3,  for  example,  was  identical  with  Network  8  in  every  respect  but  one,  the 
length  of  training  on  the  initial  set. 

It  should  be  noted  that  these  observations  are  consistent  with  the  recommendation  of  many 
researchers  that  initial  training  be  conducted  without  "noise."  For  example,  an  optical  character 
recognition  system  should  be  trained  on  sharply  defined  letters  before  attempting  to  train  on 
smudged  ones.  Although  no  speaker's  diphones  can  be  considered  "noise  free,"  the  initial  set  in 
the  preceding  discussion  takes  on  a  similar  role.  The  individual  variations  of  other  speakers  can 
then  be  viewed  as  a  form  of  noise. 
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11.  Feedback-Based  Time  Alignment 

Figure  4.9  shows  the  output  of  the  N-D  output  neuron  as  a  neural  network's  recognition 
window  is  stepped  through  a  speech  sample  containing  the  N-D  diphone.  Note  that  the  neural 
network's  output  peaked  sharply  when  the  recognition  window  was  properly  aligned  and  fell  off 
rapidly  both  before  and  after  that  window  position.  Figure  4.10  shows  the  same  output  sequence 
from  a  different  perspective.  The  solid  line  in  Figure  4.10  is  the  same  output  plotted  in  Figure 
4.9.  The  dashed  lines  show  the  outputs  of  competing  neurons  during  the  same  time  period. 
Again,  the  penalty  for  recognition  window  misalignment  is  obvious.  Because  of  the  dynamic 
nature  of  diphone  phoneme  transitions,  a  shift  of  just  one  window  position  (about  20 
milliseconds,  10  with  interleaving  as  described  in  Chapter  III)  can  completely  remove  the  correct 
output  from  the  top  three  outputs. 


Output  of  N-D  Diptune  Detector  During  the  Word  ‘Cleaned* 


Figure  4.9.  Output  of  the  N-D  neuron  as  the  recognition  window  is  stepped  through  a  speech 
sample  containing  the  N-D  diphone.  Note  that  the  neural  network  is  sensitive  to  correct  alignment 
(window  positions  are  about  10  milliseconds  apart). 

This  again  raises  an  issue  discussed  in  Chapter  HI,  recognition  window  time  alignment 
During  training,  the  TIMIT  phonetic  transcriptions  provide  a  rough  guide.  However,  as 
discussed  in  Chapter  HI,  these  transcriptions  are  only  approximate  and  they  may  be  based  on 
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time-domain  features  that  may  be  less  important  to  the  neural  networks  than  other  (frequency 
domain)  features.  Furthermore,  during  recognition  the  networks  must  recognize  diphones  with 
no  a  priori  knowledge  of  where  in  the  speech  stream  to  look. 


N-D  Output  (solid  line)  Competing  with  Other  Diphones 


Figure  4.10.  The  same  output  as  in  Figure  4.9,  but  joined  by  plots  of  the  other  156  output  neurons 
during  the  same  time  period. 

The  recognition  time  alignment  approach  was  described  in  Chapter  m.  The  top  three 
network  outputs  are  recorded  for  every  window  position  during  recognition.  The  lexical  analyzer 
then  sorts  out  which  output  peaks  to  believe.  The  training  problem  is  actually  more  challenging. 
The  TIMIT  transcriptions  are  adequate  during  initial  training.  But  small  variations  in  time 
alignment  apparently  blur  diphone  features  as  the  networks  become  more  discriminating.  Thus, 
recognition  window  alignment  should  be  adjusted  as  carefully  as  possible  prior  to  all  weight 
changes.  The  following  subsections  describe  three  approaches  to  time  alignment  and  the  results 
obtained  from  each. 
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a.  Timit  Transcription-Based  Alignment 

Network  17  in  Table  4.3  was  trained  using  only  the  TIMIT  phonetic  transcriptions  to 
time  align  the  recognition  window.  In  dozens  of  comparisons,  this  method  limited  the  neural 
networks  to  accuracy  levels  lower  than  those  obtained  with  both  of  the  other  methods. 


b.  Feedback-Based  Alignment  Started  Early 

Several  dozen  networks  were  first  trained  on  an  initial  set  (one  example  of  each 
diphone)  and  then  with  feedback-based  time  alignment  on  the  entire  training  set.  Network  18  in 
Table  4.3  was  trained  in  this  manner.  These  networks  converged  to  lower  error  levels  than  those 
trained  with  transcription-based  alignment.  However,  they  also  exhibited  more  variability  in 
recognition  window  offsets  (from  the  transcription-based  window  position)  than  expected. 
Figure  4.11  is  a  histogram  of  test  set  recognition  window  offsets  for  a  small  network  (44 
diphones,  1320  training  files,  and  397  test  files)  trained  with  this  method. 


Test  Offsets  for  Small  Data  Set  &  Very  Early  Feedback 


Figure  4.1 1 .  Offsets  between  recognition  window  center  and  transcription-based  diphone  position 
for  a  network  trained  with  early  feedback-based  time  alignment. 

After  training  a  number  of  networks  using  this  method,  one  network  proved  slow  to 

converge  on  the  initial  set.  Thus,  this  network  trained  substantially  longer  (several  hundred 
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epochs)  on  one  example  of  each  diphone  before  feedback  was  used  to  time-align  the  entire 
training  set.  The  resulting  weights  performed  better  on  the  test  set  and  also  exhibited  less 
variability  in  feedback-based  recognition  window  offsets.  Apparently,  after  lengthier  training  on 
the  initial  set  the  network  was  better  able  to  sense  the  "correct"  recognition  window  position. 

Figure  4.12  is  a  histogram  of  test  set  recognition  window  offsets  for  a  network 
immediately  after  lengthy  training  on  the  medium  data  set's  initial  set  (one  example  of  each  of 
157  diphones).  Figure  4.13  is  a  histogram  of  the  same  network's  test  set  window  offsets  after 
training  on  the  entire  training  set.  These  distributions  are  clearly  tighter  than  in  Figure  4.11. 
Because  lengthier  initial  set  training  improved  error  performance,  it  is  reasonable  to  expect  even 
better  performance  from  networks  trained  on  the  entire  training  set  before  relying  on  network 
feedback  for  recognition  window  alignment.  The  results  of  this  approach  are  described  in  the 
next  subsection. 


Test  File  Offsets  Before  Training  on  Full  Training  Set 


Figure  4.12.  Medium  test  set  recognition  window  offsets  for  a  network  immediately  after  lengthy 
training  on  the  medium  data  set's  initial  set  (one  example  of  each  of  157  diphones) 
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Test  File  Offsets  After  Early  Start  of  Time  Alignment 


Figure  4.13.  Test  set  recognition  window  offsets  (for  the  same  network  as  in  Figure  4.12)  after  training 
with  feedback-based  alignment  on  the  full  training  set  (and  lengthier  initial  set  training  than  in  Figure  4.1 1). 
Note  that  the  neural  network  usually  preferred  recognition  window  positions  clustered  around  the  position 
indicated  in  the  phonetic  transcription. 


c.  Feedback-Based  Alignment  After  Training  With  Transcription 

Networks  1 1,  12,  and  19  in  Table  4.3  were  trained  on  the  medium  data  set's  initial  set, 
followed  by  transcription-based  time  alignment  on  the  entire  training  set.  When  test  set  errors 
appeared  to  have  reached  a  minimum,  training  was  completed  with  feedback-based  alignment 
and  the  entire  training  set.  As  a  precaution,  recognition  window  offsets  were  also  limited  to  plus 
or  minus  two  window  positions  from  the  transcription-based  position.  This  method  resulted  in 
the  lowest  error  rate  of  all  three  methods  used.  Figures  4.14  and  4.15  also  show  that  these 
networks  exhibit  less  variability  in  window  offsets,  both  before  and  after  the  use  of 
feedback-based  alignment. 
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Figure  4.14.  Test  set  recognition  window  offsets  after  lengthy  training  on  the  full  training  set 
(but  before  training  with  feedback-based  alignment).  Note  that  the  additional  training  has 
apparently  improved  the  network's  ability  to  determine  when  the  diphone  is  properly  aligned  with 
the  recognition  window. 


Figure  4.15.  Test  set  offsets  after  training  with  feedback-based  alignment.  Note  that  the  network 
still  favors  positions  near  the  transcription  position,  even  after  being  given  the  freedom  to  deviate. 
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Figure  4.16  is  a  plot  of  test  set  errors  for  Network  1.  Note  the  sharp  drop  in  errors  when 
feedback-based  alignment  began  (epoch  196).  Feedback-based  offsets  are  recomputed  every  five 
epochs.  So  one  would  expect  the  quality  of  the  network's  feedback  to  improve  as  training 
progresses.  Although  it  is  difficult  to  determine  all  of  the  mechanisms  involved,  the  error  rate 
does  continue  to  improve  for  some  time  after  feedback-based  time  alignment  commences. 


Test  Set  Error  Percentages  versus  Epoch 


Figure  4.16.  Test  set  error  percentage  for  Network  1  in  Table  4.3.  Note  the  sharp  drop  in  errors  when 
feedback-based  alignment  began  at  epoch  1 96. 

12.  Short-Time  Power  Versus  RMS  Amplitude 

As  discussed  in  Chapter  HI,  the  input  feature  surface  was  initially  designed  with  one 
column  devoted  to  short  time  power.  Figure  4.17  shows  plots  of  short  time  power  for  5 
examples  of  the  T-IY  diphone.  Note  that  one  of  the  plots  is  an  outlier.  Apparently,  that  speaker 
was  louder  than  most  of  the  other  speakers.  Unfortunately,  the  squaring  operation  used  to  arrive 
at  short-time  power  accentuated  the  differences  between  the  outlier's  waveform  and  the  other 
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signals.  Figure  4.18  shows  the  same  signals  converted  to  RMS  amplitude.  Note  that  the  outlier 
is  much  less  prominent  in  the  second  figure. 


Figure  4.17.  Short  time  power  for  5  examples  of  the  T-IY  diphone.  Note  that  one  of  the 
plots  is  an  outlier.  The  outlier's  differences  are  accentuated  by  the  squaring  operation. 


Figure  4.18.  RMS  amplitude  for  the  same  5  diphone  samples  as  in  Figure  4.17.  Note  that  the 
outlier's  differences  are  now  less  extreme. 


Judging  from  several  comparisons,  including  Networks  1  and  19  in  Table  4.3,  RMS 
amplitude  is  indeed  a  better  feature  to  use  in  diphone  recognition  than  short-time  power.  This  is 
consistent  with  observations  by  Hamming  [Ref.  29]  that  L,  measures  are  often  more  useful  in 
perceptual  matters  than  L2  measures. 

D.  WORD  RECOGNITION  RESULTS 

Figure  4.19  shows  a  plot  of  all  157  outputs  from  Network  1  (Table  4.3)  when  presented 
with  a  speech  sample  containing  the  word  "cleaned."  Some  of  the  output  spikes  are,  of  course, 
from  diphone  recognizors  properly  triggered  by  this  word.  The  other  spikes  are  noise  from  false 
neuron  activations.  Figure  4.20  shows  a  contour  plot  of  only  the  relevant  diphone  detector 
outputs.  Note  that  it  is  possible  to  visually  construct  the  correct  word  from  the  peaks  in  this 
figure. 


Figure  4.19.  A  plot  of  all  157  outputs  from  Network  1  (Table  4.3)  when  presented  with  a  speech 
sample  containing  the  word  "cleaned." 
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Figure  4.20.  A  contour  plot  of  only  the  relevant  diphone  detector  outputs  from  Figure  4.19.  A  diagonal 
from  the  upper  left  to  lower  right  corners  roughly  connects  the  output  peaks  for  the  word  "cleaned." 

Figure  4.21  shows  the  corresponding  text  output  from  the  diphone  detector 
(RECOGNIZ.M  in  Appendix  A).  As  discussed  in  Chapter  m,  only  the  three  strongest  outputs 
are  passed  to  the  LISP  lexical  analyzer. 

Finally,  Figure  4.22  shows  a  portion  of  the  output  from  the  LISP  lexical  analyzer.  Notice 
that  the  correct  word  was  detected  in  the  diphone  stream.  The  syntactic  and  semantic  analyzers 
would  be  expected,  in  a  continuous  speech  recognition  system,  to  select  from  the  candidate 
words  nominated  by  the  lexical  analyzer. 

Using  the  157  diphones  selected  for  the  medium  and  large  data  sets,  3070  of  the  110,935 
entries  in  the  Carnegie  Mellon  lexicon  (version  0.3  dated  9-7-94)  can  be  constructed.  The  same 
diphones  allow  a  vocabulary  of  340  out  of  the  6224  entries  in  the  TIMIT  lexicon.  Of  these,  1977 
samples  of  120  words  could  be  found  in  the  SX  test  set  sentences.  Unfortunately,  these  120 
words  did  not  include  uniform  coverage  of  the  157  diphones  in  the  Medium  and  Large  data  sets. 
For  example,  the  word  "the"  alone  accounted  for  409  of  the  1171  in-vocabulary  speech  samples. 
Thus,  the  word  recognition  statistics  presented  in  Table  4.5  do  not  necessarily  indicate  accuracy 
levels  achievable  on  more  uniform  samples. 
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(wordjs  cleaned) 

((171  (EHS  ) 0.317149) (171  (IH NG  ) 0.195618)  (171  (RAA  )0.116300)) 

((4788  (FR  )  0.695278)  (4788  (KR  )  0.202686)  (4788  (K  L  )  0.054996)) 

((4959  (RIY  )  0.067056)  (4959  (L  lY  )  0.014768)  (4959  (K  R  )  0.007128)) 
((5130  (LIY  )  0.153800)  (5130  (IX  Z  )  0.134072)  (5130  (IX  N  )  0.037814)) 
((5301  (AX  N  )  0.438422)  (5301  (N  lY  )  0.025831)  (5301  (DH  lY  )  0.003564)) 
((5472  (AX  N  )  0.324274)  (5472  (D  IH  )  0.029013)  (5472  (L  EH  )  0.018131)) 
((5643  (NIH  )  0.073388)  (5643  (AXR  Z  )  0.018055)  (5643  (N  D  )  0.016444)) 
((5814  (IH  NG  )  0.126056)  (5814  (lY  IX  )  0.114169)  (5814  (N  IH  )  0.059837)) 
((5985  (IH  NG  )  0.598576)  (5985  (IX  M  )  0.141212)  (5985  (IH  M  )  0.022087)) 
((6156  (IX M  ) 0.354304)  (6156  (IH NG  ) 0.092821)  (6156  (NIX  ) 0.091743)) 
((6327  (IH  NG  )  0.203180)  (6327  (S  AX  )  0.043799)  (6327  (EY  N  )  0.038053)) 
((6498  (EYN  )  0.021970)  (6498  (lY  N  )  0.013394)  (6498  (IX  T  )  0.006094)) 
((6669  (lYN  )  0.146805)  (6669  (AEM  )  0.114745)  (6669  (KN  )  0.029980)) 
((6840  (AX  V  ) 0.180183)  (6840 (AEM  ) 0.013460)  (6840  (R IH  ) 0.01 1701)) 
((7011  (WIH  )  0.031 189)  (701 1(ND  )  0.009318)  (7011  (MP  )  0.002605)) 
((7182 (KR  ) 0.391690) (7182 (ND  ) 0.031431) (7182 (REY  ) 0.007329)) 
((7353  (KR  ) 0.137409) (7353  (ND  ) 0.122368) (7353 (NDH  ) 0.104988)) 
((7524  (ND  )  0.323681)  (7524  (MP  )  0.277733)  (7524  (N  T  )  0.145333)) 
((7695  (ND  ) 0.594521)  (7695  (M EY  )  0.056732)  (7695  (N  S  ) 0.026563)) 
((7866  (RAE  )  0.060286)  (7866  (N  S  )  0.049713)  (7866  (HX  D  )  0.029375)) 
((8037  (K  S  )  0.017934)  (8037  (N  S  )  0.012144)  (8037  (AXR  Z  )  0.007918)) 
((8208  (AXR  Z  )  0.233987)  (8208  (AX  N  )  0.009879)  (8208  (K  W  )  0.008760)) 
((8379  (AXR  Z  )  0.016868)  (8379  (IX  T  )  0.009465)  (8379  (D IX  )  0.003793)) 
((8550 (lY IX  ) 0.001285) (8550 (RAA  ) 0.000765) (8550 (W lY  ) 0.000573)) 
((8721  (JH IX  )  0.051646)  (8721  GY  S  )  0.044583)  (8721  (T  HX  )  0.006520)) 
((8892  (JH IX  )  0.215402)  (8892  (D  HX  )  0.034956)  (8892  (lY  S  )  0.024411)) 
((9063  (KAE  )  0.123909)  (9063  (IX  N  )  0.106962)  (9063  (IH  Z  )  0.065737)) 
((9234  (IX  Z  )  0.290279)  (9234  (m  Z  )  0.01 1543)  (9234  (Z  AX  )  0.006819)) 
((9405  (DC  Z  )  0.978204)  (9405  (AX  V  )  0.038427)  (9405  (IH  Z  )  0.005301)) 
((9576  (SH IX  )  0.236237)  (9576  (IX  V  )  0.058200)  (9576  (AX  V  )  0.020296)) 
((9747  (ST  ) 0.109034)  (9747  (KT  ) 0.056291)  (9747  (SH DC  ) 0.034269)) 
((9918  (AX  L  )  0.075804)  (9918  OYN  )  0.008476)  (9918  (Z  DC  )  0.006940)) 
((10089  (DHX  )  0.118372)  (10089  (TEH  )  0.112912)  (10089  (S  HX  )  0.030426)) 

((10602  (ZHX  )  0.330007)  (10602  (S  AH  )  0.132011)  (10602  (D  R  )  0.089018)) 
(word_end  cleaned) 


Figure  4.21 .  Output  from  all  157  diphone  detectors  for  the  speech  sample  plotted  in  Figures  4.19  and 
4.20.  Only  the  three  strongest  outputs  for  each  time  step  are  included  in  the  output  (and  passed  to  the 
LISP  lexical  analyzer),  with  one  output  line  per  time  step.  The  diphone  outputs  used  by  the  lexical 
analyzer  to  build  the  word  "cleaned"  are  in  bold  type.  Each  diphone  output  consists  of  the  sample  number 
where  the  diphone  was  detected,  the  diphone  name,  and  the  numerical  output  of  the  neuron  (used  as  a 
measure  of  confidence  by  the  lexical  analyzer).  The  "word_is"  and  "word_end"  labels  at  the  beginning 
and  end  of  the  stream  are  used  to  capture  error  statistics,  not  for  recognition. 
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- [  WORD  RECOGNIZOR  OUTPUT  ] - 

(wordjs  cleaned) 

((171  7182)  ROCK  0.253995) 

((4617  9063)  WICK  0.057184998) 

((4617  9063)  QUICK  0.057184998) 

((4788  7182)  RAY  0.3513035) 

((4788  10602)  PHRASE  0.34420466) 

((4788  7524)  RIM  0.32823732) 

((4788  5985)  CLING  0.326786) 

((4788  8892)  RIDGE  0.30746034) 

((4788  10089)  READ-V_PRES  0.29356867) 

((4788  10602)  RACE  0.278206) 

((4788  9747)  WRIST  0.27200434) 

((4788  10089)  RATE  0.27183965) 

((4788  7695)  CLEAN  0.26777232) 

((4788  10431)  REAL  0.25929865) 

((4788  9747)  RAKED  0.252966) 

((4788  10431)  RAIL  0.23938966) 

((4788  10089)  CLEANED  0.23042224)  <-  Correct  word  ranked  14th  among  words  with  same  start  time 

((4788  7695)  SCREEN  0.21856575) 

((4788  10089)  AFRAID  0.20774475) 

((4788  10602)  CLEANS  0.171402) 

((4788  10602)  LAMB--S  0.16658266) 

((4788  10602)  CLAMS  0.16658266) 

((4788  7524)  RIM  0.16404) 

((4788  7524)  LAMP  0.149158) 

((4788  7524)  CLAMP  0.149158) 

((4788  8208)  CLEANERS  0.14739701) 

((4959  6669)  THREE  0.07462034) 

((4959  6669)  TREE  0.07462034) 

((5130  10602)  EASE  0.24190348) 

((5130  7695)  CLEANED  0.2262815)  <-  Same  instance  of  word,  but  built  starting  at  2nd  diphone 

((5130  7182)  KNICK-  0.16763067) 

((5130  10602)  CLEANS  0.160153) 

((5130  8208)  CLEANERS  0.136148) 

((5130  10089)  LEAD  0.136086) 

- [  debugging 

(CLEANED  ((4788  (K  L)  0.054996) 

(5130  (LIY)  0.1538) 

(7695  (ND)  0.594521) 

(10089  (DHX)  0.118372)) 

0.23042224) 

(CLEANED  ((-l(KL)O.Ol) 

(5130  (LIY) 0.1538) 

(6669  (lYN)  0.146805) 

(7695  (ND) 0.594521)) 

0.2262815) 


OUTPUT  ] - 

<-  Correct  word  built  starting  from  K-L  diphone 


<-  Dummy  diphone  inserted  by  lexical  analyzer 

<-  This  word  instance  constructed  by  starting  search  from  L-IY 


Figure  4.22.  Output  stream  from  the  lexical  analyzer  for  the  speech  sample  of  Figures  4.19  through  4.21 . 
Each  word  candidate  is  listed  with  its  start  sample  number  and  overall  confidence  value  (average  of  its 
diphone  confidence  values.  A  more  detailed  output  format  (for  debugging)  is  listed  at  the  bottom.  Note 
the  large  number  of  rhyming  words  and  word  fragments  detected.  Note  also  that  the  analyzer  was  able  to 
detect  the  correct  word  even  without  the  first  diphone. 
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DESCRIPTION 

OF  THOSE 
DETECTED 

OVERALL 

Percentage  of  time  correct  word  was  in  lexical  analyzer's  output  stream 

- 

85.0% 

Correct  word  ranked  first  of  word  candidates  with  same  start  time 

41.6% 

35.4% 

Correct  word  ranked  among  top  3  word  candidates  with  same  start  time 

60.0% 

51.0% 

Correct  word  ranked  among  top  6  word  candidates  with  same  start  time 

74.0% 

62.9% 

Table  4.5.  Summary  of  Word  Recognition  Statistics  Using  Network  1  from  Table  4.3 


The  neural  network  and  lexical  analyzer  were  deliberately  presented  with  speech  segments 
that  started  before  and  ended  after  the  word  being  tested.  Thus,  the  neural  network  and  lexical 
analyzer  achieved  the  results  of  Table  4.5  on  what  amounts  to  continuous  speech.  Also,  the  word 
recognition  tests  were  conducted  before  (by  about  two  percent)  Network  1  reached  the  error 
minimum  listed  in  Table  4.3.  So  slightly  higher  scores  might  be  achievable,  even  with  the 
limitations  of  the  word  test  set. 

As  mentioned  in  the  table's  description  column,  the  accuracy  rates  in  Table  4.5  are  for  all 
word  candidates  with  starting  sample  numbers  that  match  the  correct  word’s  starting  sample 
number.  In  many  cases,  the  correct  word  was  both  globally  and  locally  the  strongest  candidate 
nominated  by  the  lexical  analyzer.  In  other  cases,  the  correct  word  had  a  high  ranking  locally 
(among  word  candidates  with  the  same  start  time)  but  other  words  ranked  higher  globally.  In  the 
latter  case,  it  would  be  up  to  the  syntactic  and  semantic  analyzers  to  properly  segment  the 
sentence  and  select  the  appropriate  word  candidates.  On  the  average,  the  85  percent  of  the 
correct  words  that  were  detected  were  ranked  number  5.44  among  words  with  the  same  starting 
time. 

Appendix  G  contains  a  more  detailed  list  of  the  word  recognition  results  summarized  in 
Table  4.5.  It  is  interesting  to  look  at  both  the  successes  and  failures  of  the  lexical  analyzer. 
Although  no  detailed  analysis  has  yet  been  made  of  the  errors,  some  samples  have  been  listed  in 
Table  4.6,  along  with  comments  as  appropriate. 

Further  details  on  the  lexical  analysis  algorithm  are  available  in  the  source  code  comments 
in  Appendix  C.  Finally,  Chapter  VI  includes  recommendations  on  how  the  word  recognition 
rates  of  Table  4.5  might  be  improved. 
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RANK  RANK 


of 


by 


may 


frantically 


in 


of 


run 


leap 


candy 


quick 


made 


my 


money 


are 


intelligence 


in 


become 


hard 


LOCAL 

WINNER 

GLOBAL 

WINNER 

fish 

fish 

of 

of 

buy 

buy 

may 

may 

frantically 

leaf 

in 

the 

of 

am 

run 

may 

leap 

knee 

candy 

tea 

quick 

wick 

may 

may 

mind 

mind 

my 

knee 

art 

in 

intent 

ten 

dim 

dim 

BE 

BE 

ARE 

ARE 

REMARKS 


first  4  are  local  and  global  winners 


no  attempt  to  resolve  homonyms 


local  best  but  not  global 


“the"  is  next  word  in  sentence 


"wick"  portion  of  word  stronger 


"may"  part  stronger 


“my"  part  real,  rest  is  fiction 


"knee"  part  stronger 


next  word  starts  with  a  stop 


more  confident  in  first  part 


nasals  are  hard  to  tell  apart 


missed  last  part  of  word 


missed  first  part  of  word 


Table  4.6.  Sample  Word  Recognition  Results 


E.  ANALYSIS  OF  REMAINING  ERRORS 

As  with  word  recognition  errors,  no  detailed  analysis  has  yet  been  conducted  of  remaining 
diphone  recognition  errors.  However,  Figure  4.23  shows  that  these  errors  are  at  least  fairly 
uniformly  distributed  among  diphones.  Again,  Chapter  VI  offers  recommendations  on  how  the 
remaining  errors  might  be  reduced. 

F.  SUMMARY 

The  results  reported  in  this  chapter  are  encouraging.  The  next  chapter  suggests  what  these 
results  imply  with  respect  to  the  basic  research  questions  of  this  thesis. 
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Percentage  of  Test  Set  Diphones  in  Top  3  for  157  Diphones 
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Figure  4.23.  Accuracy  rates  (top  3  metric)  for  157  diphones  of  Network  1  in  Table  4.3. 
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V.  CONCLUSIONS 


A.  INTRODUCTION 

This  chapter  briefly  summarizes  major  conclusions  that  may  be  drawn  from  the 
results  of  Chapter  IV.  In  particular,  the  research  questions  listed  in  Chapter  I  are 
addressed. 

B.  THE  DIPHONE  AS  A  RECOGNITION  UNIT 

The  results  reported  in  Chapter  IV  imply  that  the  diphone  is  indeed  a  suitable 
recognition  unit  for  speaker-independent  continuous-speech  recognition  systems.  As 
further  evidence  of  their  potential.  Table  5.1  compares  the  performance  of  two  diphone 
recognizors  (from  Tables  4.3  and  4.4)  with  two  recent  neural  network-based  phoneme 
recognition  systems. 


RECOGNITION 

UNIT 

NUMBER 

OF  OUTPUT 
CLASSES 

TRAINING 

SET 

CORRECT 

TRAINING 
TOP  3 

TEST  SET 
CORRECT 

TEST  SET 
TOP  3 

[Ref.  23] 

PHONEME 

46 

- 

- 

27.2% 

55.6% 

[Ref.  30] 

PHONEME 

47 

58.5% 

68%* 

52.3% 

74%* 

Table  4.4 

DIPHONE 

44 

100% 

100% 

71.5% 

95.7% 

Table  4.3 

DIPHONE 

157 

98.2% 

98.4% 

70.2% 

89.0% 

*  Estimated  from  Figure  2  in  the  reference 

Table  5.1 .  A  Comparison  of  Recognition  Rates  for  Several  Phoneme  and  Diphone  Recognizors. 


The  statistics  in  Table  5.1  suggest  that  the  diphone  is  inherently  easier  to  recognize 
from  its  acoustics  properties  than  is  the  phoneme.  Note  particularly  the  contrast  between 
the  first  three  networks  in  the  table.  With  nearly  the  same  number  of  output  classes,  the 
diphone  recognizor's  accuracy  rates  are  substantially  higher  in  every  category. 

C.  FREQUENCY  NORMALIZATION 

Although  more  study  is  needed,  this  research  suggests  that  pitch  can  serve  as  a  basis 
for  improving  speaker  independence  through  frequency  normalization.  Table  5.2  again 
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shows  the  apparent  impact  of  frequency  normalization  on  several  networks  (from 
Table  4.3) . 


■ 

Data 

Set* 

Test 

Correct 

Train 

Correct 

Train 
Top  3 

Frequency 

Normalized 

Time 

Alignment 

RMS 

Amplitude 

Window 

Width 

Remarks 

M30 

54.5% 

73.5% 

87.2% 

87.3% 

NO 

NO 

NO 

13 

Compare  w/4 

■ 

56.8% 

80.3% 

94,3% 

94.4% 

YES 

NO 

NO 

13 

Compare  w/3 

5 

M30 

52.0% 

72.0% 

84.7% 

85.1% 

NO 

NO 

NO 

13 

Compare  w/8 

■ 

79.0% 

91.1% 

91.1% 

YES 

NO 

NO 

13 

Compare  w/5 

The  medium  data  set  (30  examples  of  each  diphone  in  training  set)  is  indicated  by  M30.  The  large  data  se 
(100  examples  of  each  diphone  in  training  set)  is  labeled  LI  00. 

Table  5.2.  Comparison  of  Neural  Networks  With  and  Without  Frequency  Normalization 


D.  TIME  ALIGNMENT 

Proper  alignment  of  diphone  feature  vectors  with  the  recognition  window  has 
proven  to  be  absolutely  critical.  Because  of  the  dynamic  nature  of  diphones,  an  alignment 
error  of  just  one  recognition  window  position  can  drop  the  correct  output  from  first  place 
to  the  noise  level.  However,  this  research  shows  that  neural  network  feedback  can  be 
used  to  improve  alignment  prior  to  weight  adjustments.  Specifically,  the  best  results  were 
obtained  by  training  in  three  phases: 

•  Train  on  a  single  sample  of  each  diphone  until  the  average  sum-squared  error 
(total  SSE  divided  by  the  number  of  input  vectors)  is  on  the  order  of  10'^. 

•  Train  with  the  full  training  set  and  phonetic  transcription-based  alignment 
until  test  set  errors  reach  a  minimum. 

•  Resume  training  and  use  neural  network  feedback  to  position  the  training 
vectors  in  the  recognition  window.  To  avoid  unreasonable  window  positions,  the 
maximum  offset  from  the  transcription  position  should  be  constrained  (plus  or 
minus  about  40  milliseconds  seemed  adequate). 

During  recognition,  the  sharp  temporal  selectivity  of  the  trained  network  can 
actually  be  considered  an  advantage.  This  issue  is  discussed  further  in  the  next  chapter. 
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E.  TRANSLATION  OF  DIPHONES  TO  WORDS 


This  research  implies  that  continuous-speech  word  recognition  can  be  achieved 
using  diphones  as  the  recognition  unit.  Discrete  utterances  were  not  assumed  in  the 
algorithm  described  in  Chapters  HI  and  IV.  Yet  that  algorithm  usually  spotted  the  correct 
word  in  the  neural  network's  output  stream.  Unfortunately,  it  also  spotted  words  that 
were  not  present  in  the  speech  sample.  This  does  not  mean  that  the  basic  approach  is 
unworkable.  But  it  does  mean  that  the  syntactical  and  semantic  analyzers  will  need  to 
share  some  of  the  word  recognition  burden.  Specifically,  they  will  need  to  use  syntactic 
and  semantic  knowledge  to  eliminate  non-sensical  words  from  the  lexical  analyzer's 
output  stream.  This  may  not  be  as  difficult  or  unreasonable  as  it  seems.  After  all, 
humans  routinely  use  syntactic  and  semantic  knowledge  to  compensate  for  poor  acoustic 
signals.  One  factor  that  would  help  is  the  temporal  information  provided  by  the  lexical 
analyzer.  Once  a  word  is  recognized  with  some  confidence,  all  word  candidates 
overlapping  that  word  in  time  can  be  eliminated  from  further  consideration  (unless,  of 
course  the  system  is  forced  to  later  reject  that  word  and  backtrack).  The  process  can  also 
be  facilitated  by  the  use  of  word  segmentation  algorithms  such  as  those  currently  used  by 
discrete  speech  recognition  systems.  This  issue  is  discussed  further  in  the  next  chapter. 

F.  INPUT  VECTOR  DESIGN 

No  attempt  was  made  to  experiment  with  input  vector  alternatives  other  than  length 
and  frequency  normalization.  Since  the  input  vector  design  used  in  this  research  was 
based  largely  on  intuition,  some  aspects  of  that  design  may  not  be  optimal.  However,  the 
results  imply  that  it  has  some  merit.  In  particular,  the  design  probably  benefited  from  the 
inclusion  of  RMS  amplitude  (rather  than  short-time  power),  frequency  normalization  of 
differenced  cepstral  coefficients,  the  use  of  critical  bands,  and  a  length  of  about  289 
milliseconds. 
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G.  NEURAL  NETWORK  TRAINING  METHODS 


It  would  be  impossible  to  conduct  research  of  this  nature  without  drawing 
conclusions  about  which  neural  network  training  techniques  work  and  which  do  not.  The 
following  comments  summarize  the  key  lessons  learned: 

•  Use  a  single  hidden  layer  unless  there  are  very  compelling  reasons  to  use 
more. 

•  As  a  rough  rule  of  thumb,  the  hidden  layer  can  be  started  with  about  twice  as 
many  neurons  as  there  are  output  classes.  Neurons  can  then  be  pruned  or  added  as 
needed.  Of  course,  this  suggestion  is  very  specific  to  diphone  recognition  using 
the  feature  vector  described  in  Chapter  m. 

•  In  any  neural  network  classifications  system,  the  feature  vector  should  be 
carefully  designed  to  eliminate  frivolous  and  accentuate  relevant  features.  This 
may  require  extensive  preprocessing. 

•  Training  should  be  conducted  without  noise  first  [Ref.  9]. 

•  The  larger  the  number  of  training  exemplars,  the  better.  The  sequence  of 
presentation  should  be  randomized  (although  a  single  "shuffling"  may  suffice  if 
the  training  set  is  large). 

•  Training  exemplars  should  be  evenly  divided  between  output  classes. 

•  The  random  seed  should  be  set  explicitly  prior  to  weight  initialization.  This 
makes  results  repeatable  and  it  allows  for  deliberate  restarts  from  different  points 
on  the  error  surface. 

•  Weight  changes  should  not  be  batched  (at  least  not  when  training  with 
diphones). 

•  Momentum  and  an  adaptive  learning  rate  should  be  used  [Ref.  9]. 
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•  Use  a  separate  test  set  (on  every  epoch)  and  stop  training  when  test  set  errors 
reach  a  minimum. 

H.  SUMMARY 

While  the  conclusions  offered  in  this  chapter  are  encouraging,  many  questions 
remain.  The  next  chapter  will  suggest  avenues  of  research  that  may  address  some  of 
those  questions. 
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VI.  RECOMMENDATIONS  FOR  FURTHER  RESEARCH 


A.  INTRODUCTION 

Previous  chapters  included  remarks  on  a  few  of  the  areas  where  further  diphone-related 
research  is  needed.  For  example,  further  study  is  needed  to  determine  if  the  frequency 
normalization  method  introduced  in  Chapter  TV  can  be  improved  on  (perhaps  by  gathering  data 
on  other  vowels).  Also,  both  diphone  and  word  recognition  errors  that  remain  should  be  studied 
in  depth.  If  any  trends  are  identified,  they  may  inspire  improvements  to  the  neural  network’s 
feature  vector  design. 

Two  additional  research  areas  seem  promising  enough  to  justify  particular  emphasis  in  this 
chapter.  They  are  summarized  in  the  next  two  sections. 

B.  MODULAR  NEURAL  NETWORKS 

As  illustrated  in  Table  4.4,  extremely  high  recognition  rates  can  be  achieved  by  neural 
networks  trained  to  recognize  up  to  about  50  diphones.  Yet  large  vocabularies  require  over  1000 
diphones.  The  accuracy  levels  achievable  on  157  diphones  are  encouraging.  However,  modular 
neural  networks  may  scale  up  better  than  the  monolithic  neural  networks  used  in  this  research. 
Modular  networks  are  discussed  in  depth  in  [Ref.  13].  The  basic  idea  is  to  break  the  problem 
into  smaller  subproblems,  each  assigned  to  its  own  subnetwork.  A  top-level  network  then  selects 
a  winner  from  the  outputs  of  the  subnetworks. 

Average  duration  might  be  a  useful  basis  for  partitioning  the  diphones.  The  recognition 
window  length  of  each  subnetwork  could  then  be  more  closely  matched  to  the  lengths  of  the 
diphones  to  be  recognized.  Diphones  could  also  be  partitioned  based  on  the  first  phoneme.  This 
approach  was  used  successfully  in  the  two  diphone  recognizors  of  Table  4.4.  In  either  case,  the 
overall  recognition  rate  should  be  quite  good  if  the  top-level  network  also  performs  well. 

C.  LEXICAL  ANALYZER  IMPROVEMENTS 

Unfortunately,  time  did  not  permit  extensive  experimentation  with  the  lexical  analyzer.  In 
particular,  improvements  in  two  areas  might  increase  word  recognition  accuracy.  The  first  area 
is  in  the  method  used  to  compute  the  word  confidence  level.  This  is  currently  just  the  average  of 
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the  confidence  levels  (neural  network  output  values)  of  the  diphones  used  to  build  the  word. 
This  approach  has  several  disadvantages: 

•  The  lexical  analyzer  tends  to  favor  pieces  of  words,  if  those  pieces  are  themselves 
words  and  the  pieces  have  higher  average  confidence  values  than  the  correct  word. 
Both  the  complete  word  and  its  pieces  will  be  in  the  output  stream,  but  the  syntactic 
and  semantic  analyzers  will  have  to  work  harder  to  find  the  correct  word  if  it  is  ranked 
lower. 

•  Short  words  can  have  their  confidence  values  unduly  influenced  by  one  or  two 
diphones.  A  single  strong  diphone  can  rocket  the  word  to  the  top  of  the  list. 

Similarly,  a  single  weak  diphone  can  bury  the  word  at  the  bottom  of  the  candidate  list, 
or  even  cause  it  to  be  missed  altogether. 

•  Words  with  only  half  of  their  diphones  actually  detected  (every  other  diphone  is 
sufficient  for  word  detection)  may  score  higher  than  words  with  every  diphone 
detected  (but  a  lower  average  confidence  value).  This  is  counter-intuitive. 

Expert  system  developers  face  similar  evidence  weighting  problems.  Unfortunately,  the 
uncertainty  management  methods  typically  applied  to  expert  systems  do  not  (at  first  glance)  seem 
to  fit  this  problem  well. 

The  second  area  of  the  lexical  analyzer  that  deserves  scrutiny  is  related  to  the  first.  Other 
than  rejecting  diphones  that  appear  in  the  stream  unreasonably  early  or  late,  temporal  information 
is  not  incorporated  into  the  word  confidence  value  calculations.  It  seems  likely  that  the  times 
between  two  given  diphones  would  be  normally  distributed.  Diphones  which  appear  earlier  or 
later  than  the  expected  time  should,  it  seems,  be  penalized.  Although  it  would  add  a  small 
amount  of  complexity  to  the  algorithm,  doing  so  might  greatly  improve  the  lexical  analyzer's 
discrimination. 

D.  SUMMARY  AND  CONCLUSIONS 

Many  questions  remain  unanswered.  And  the  methods  described  in  this  thesis  may  require 
some  adaptation  before  they  can  be  applied  to  practical  systems.  But  this  research  implies  that 
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the  diphone  is  indeed  a  promising  recognition  unit.  Also,  pitch-based  frequency  normalization 
and  feedback-based  time  alignment  appear  to  be  potentially  useful  for  speech  recognition  in 
general,  and  diphone  recognition  in  particular.  Hopefully,  these  techniques  will  bring  the  field  a 
small  step  closer  to  the  day  when  research  can  focus  on  the  most  serious  challenges  of  speech 
recognition,  semantic  and  pragmatic  analysis. 
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APPENDIX  A.  MATLAB  CODE 


A.  CONTENTS 

1.  TRNBPX.M 

This  is  a  script  for  training  a  neural  network  to  recognize  diphones.  It  requires  a  text  file 
listing  precomputed  feature  vectors  (files)  produced  by  MAKVECTS.M. 

2.  RECOGNIZ.M 

This  script  file  produces  a  text  file  listing  diphones  detected  in  designated  speech  (.WAV) 
files.  Output  from  this  script  is  used  by  the  lexical  analyzer  for  word  recognition. 

3.  FEATURES.M 

This  function  returns  a  feature  surface  for  a  designated  portion  of  a  speech  (.WAV)  file. 
Used  by  MAKVECTS.M. 

4.  FEATUREL.M 

Same  as  FEATURES  .M  except  the  feature  surfaces  are  for  a  word  instead  of  a  single 
diphone.  Used  by  RECOGNIZ.M. 

5.  TESTNET.M 

This  script  produces  a  detailed  report  of  test  set  errors  for  a  given  set  of  network  weights. 

6.  MAKVECTS.M 

This  script  computes  feature  surfaces  for  a  list  of  speech  files.  The  speech  file  list  must 
include  a  diphone  name  and  sample  numbers  for  each  file.  See  Appendix  B  for  the  utilities  used 
to  produce  that  list. 

7.  INPUTS.M 
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This  script  obtains  training  parameters  from  the  user  before  training  the  network.  Invoked 
byTRNBPX.M. 

8.  P_AMDF.M 

This  function  returns  the  speaker's  apparent  pitch  for  a  sample  of  speech.  Uses  the 
Average  Magnitude  Difference  Function  (AMDF). 

9.  P_CEPS.M 

This  function  returns  the  speaker's  apparent  pitch  for  a  sample  of  speech.  Uses  cepstral 
pitch  detection. 

10.  VECTORIZ.M 

This  function  is  used  by  TRNBPX.M  to  convert  a  feature  surface  into  a  feature  vector. 

11.  DIPNUMOF.M 

This  function  is  used  by  TRNBPX.M  to  return  an  index  number  for  a  given  diphone  name. 

12.  DIFF_MAT.M 

This  function  is  used  by  MAKVECTS  to  compute  the  differenced  version  of  a  feature 
surface. 

13.  LD16BIT.M 

This  function  (from  the  NPS  SPC  Toolbox)  loads  a  PCM  speech  file. 

14.  LESSTHAN.M 

This  is  a  less  than  function  for  use  in  the  binary  search  performed  by  DIPNUMOF.M. 

15.  STRCAT.M 

This  is  a  simple  string  concatenation  function. 
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B. 


TRNBPX.M 


%  TRNBPX  -  Trains  a  neural  network  to  recognize  diphones 

% 

%  This  code  contains  many  patches  and  a  great  deal  of  scaffolding  for  the 
%  purpose  of  trying  out  various  training  alternatives.  Like  most  prototypes, 

%  it  should  be  thrown  out  and  replaced  with  something  cleaner  and  more 
%  efficient. 

% 

%  User  defineable  parameters  are  loaded  from  inputs.mat,  if  it  exists. 

%  Othenwise,  INPUTS.M  is  invoked  to  obtain  settings  from  the  user. 

%  Remove  inputs.mat  before  invoking  TRNBPX.M  if  you  wish  to  change 
%  settings  from  the  previous  run. 

% 

%  The  following  text  files  must  reside  in  the  current  directory: 

% 

%  (1)  A  file  listing  the  diphones  to  be  recognized  (one  diphone 
%  per  line,  a  single  space  separating  the  phonemes  of  the  diphone, 

%  and  all  caps) 

% 

%  (2)  A  file  listing  the  training  MAT  files  that  were  created 
%  byMAKVECTS.M 
% 

%  (3)  [optionally]  a  file  listing  the  test  MAT  files  that  were  created 
%  byMAKVECTS.M 

% 

%  Written  by:  Major  Mark  Cantrell,  USMC 

% 

%  Modified  1 1  Feb  96  (new  time  alignment  method) 

%  12  Feb  96  (trimmed  fat  to  speed  up  code) 

%  26  Feb  96  (re-anchored  ref  set  in  middle  of  recog  window) 

%  2  Mar  96  (Test  files  now  use  window  center  if  time  align  off) 

%  7  Mar  96  (Fixed  bug  in  binary  search  from  12  Feb  changes) 

%  8  Mar  96  (Once  again  unleashed  ref  set) 

%  17  Apr  96  *  User  inputs  moved  to  INPUTS.M 

%  *  Static  file  offsets  optionally  loaded  from  mat  file 

%  *  Randomness  inserted  in  training  file  sequencing 

%  20  Apr  96  (Allowed  test  set  time  alignment  based  on  max_win_offset) 

%  18  May  96  (Patched  to  use  RMS  Amplitude  vice  Short  Time  Power) 

%  13  Jun  96  (fixed  inconsistencies  in  use  of  w1  versus  new_w1  and 

%  b1  versus  new_b1  [same  for  w2  &  b2]) 

% 

%%%%%%%%%  Distribution  and  Copyright  %%%%%%%%%%%%%%%%%%%%%%%%%%% 
% 

%  This  prologue  must  be  included  in  all  copies  of  this  software. 

% 

%  This  software  is  released  to  the  Public  Domain 

% 

%  Note: 

% 

%  Software  released  to  the  Public  Domain  is  not  subject 
%  to  copyright  protection. 

% 

%  The  MathWorks,  Inc.  holds  the  copyright  on  the  MATLAB 
%  Neural  Network  Toolbox  functions  invoked  by  this  script  file. 


107 


%  Restrictions  on  use  or  distribution:  NONE 
% 

%%%%%%%%%  Disclaimer  %%%%%%%%%%%%%%%%- 
% 

%  This  software  and  its  documentation  are  provided  "AS  IS"  and 
%  without  any  expressed  or  implied  warranties  whatsoever. 

%  No  warranties  as  to  performance,  merchantability,  or  fitness 
%  for  a  particular  purpose  exist. 

% 

%  Because  of  the  diversity  of  conditions  and  hardware  under 
%  which  this  software  may  be  used,  no  warranty  of  fitness  for 
%  a  particular  purpose  is  offered.  The  user  is  advised  to 
%  test  the  software  thoroughly  before  relying  on  it.  The  user 
%  must  assume  the  entire  risk  and  liability  of  using  this 
%  software. 

% 

%  In  no  event  shall  any  person  or  organization  of  people  be 
%  held  responsible  for  any  direct,  indirect,  consequential 
%  or  inconsequential  damages  or  lost  profits. 

% 

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 

%  LOAD  USER  DEFINABLE  PARAMETERS 

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 


if  exist('inputs.mat’) 
load('inputs'); 
else 
inputs; 
end 

revision  =  '12.0  (LI)'; 
pack 

nice_start  =  6;  %  this  is  hour  when  training  slowdown  starts 
nice_stop  =  18;  %  this  is  hour  when  training  slowdown  stops 


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 

%  TRAINING  PARAMETERS 

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 


if  ~exist('seed')  %  start  rand  num  gen  at  known  point  each  time  to 
seed  =  0;  %  ensure  that  any  performance  changes  aren't  result 

end  %  of  different  start  pos 

rand('seed',seed); 


Ir  =  0.01 ;  %  Learning  rate,  0.01 . 

im  =  1 .05;  %  Learning  rate  increase,  default  =  1 .05. 

dm  =  0.7;  %  Learning  rate  decrease,  default  =  0.7. 

f1='logsig';  %  hidden  layer's  transfer  function 
f2='logsig';  %  output  layer's  transfer  function 
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[hidden_size  input_size]  =  size(w1); 
win_width=input_size/30; 

mid_pos  =  round((27-win_width)/2)  + 1 ; 
win_slack  =  27-win_width+1 ; 

win_scan_start  =  14  -  (round(win_width/2)  +  max_win_offset  - 1); 
win_scan_stop  =  win_scan_start  +  (2  *  max_win_offset); 
if  win_scan_start  <  1 
win_scan_start  =  1 ; 
end 

if  win_scan_stop  >  win_slack 
win_scan_stop  =  win_slack; 
end 

reportjilename  =  'nnreport.txt'; 


ste_start  =  29*win_width+1 ;  %  part  of  STE  patch 

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 
%  LOAD  THE  LIST  OF  DIPHONES  TO  BE  RECOGNIZED 
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 


num_dips  =  0; 
dip_names  =  zeros(575,7); 

if  lfname=0 
return 
else 

lfpname=strcat(lpname,lfname); 
list_file  =  fopen(lfpname); 
line_read  = ' 

while  (~feof(list_file))  &  (length(line_read)>2)  %  for  every  diphones  listed 
line_read  =  upper(fgetl(list_file)); 
if  length(line_read)  >  2 
num_dips  =  num_dips  +  1; 

dip_names(num_dips,:)  =  blanks(length(dip_names{num_dips,:))); 
dip_names(num_dips,1:length(line_read))  =  line_read; 
end 
end 

fclose(list_file); 

end 


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%o/o%%%'»/o%%%o/o%%o/o%%%% 

%  open  training  file  list 

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 

if  mfname==0 
return 
end 

mfpname=strcat(mpname,mfname); 


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%o/oO/o%%%o/c,%%o/o%%%%'>/o% 

%  open  test  file  list 

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%o/o%%%%%%o/o'%o/^ 
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if  sep_test_data  =  1 

if  tfname=0 
return 
end 

tfpname=strcat(tpname,tfname); 

end 


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 

%  INITIALIZE  THE  NETWORK 

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 


flops(O);  %  reset  flop  counter 
gigaflops  =  0; 

if  (reset_weights  ==  1 )  %  randomize  weights  &  reset  epoch  counter 

flops(O);  %  reset  counters 
gigaflops  =  0; 

epoch_count  =  1 ; 

best_correct  =  0;  %  used  to  detennine  when  a  set  of  wts 

best_wayoff  =  inf;  %  is  best  so  far 

best_SSE  =  inf; 

SSE_record  =  zeros(1 ,1); 
t_SSE_record  =  zeros(1 ,1); 

disp(strcat('lnitializing  the  neural  network  for  output  layer  size  of  ',num2str(num_dips))); 

inputjimits  =  ones(win_width*30,2):  %  input  vector  is  win_width*30  inputs 
inputjimits{1  :win_width*29,1 )  =  -10*  input_limits(1  :win_width*29,1 );  %  freq  mins 

input_limits(1  :win_width*29,2)  =  10  *  inputjimits(1  :win_width*29,2);  %  freq  maxs 

inputjimits(win_width*29+1  :win_width*30,1 )  =  -30  *  inputjimits(win_width*29+1  :win_width*30,1 );  % 
freq  mins 

inputjimits(win_width*29+1  :win_width*30,2)  =  30  *  input_limits(win_width*29+1  :win_width*30,2):  % 

freq  maxs 

[w1  ,b1  ,w2,b2]  =  initff(input_limits,hidden_size,'logsig',num_dips,'logsig'); 
else  %  resuming  with  same  weights  so  use  same  epoch  count  &  record 
epoch_count  =  epoch_count  + 1; 
end 


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 
%  final  setup  for  training  the  network 

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 

dfl  =feval(f1, 'delta'); 
df2  =  feval(f2, 'delta'); 
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dw1  =  w1*0; 

db1  =brO; 

dw2  =  w2*0; 
db2  =  b2*0; 

MC  =  0: 

last_SSE  =  inf; 

new_w1  =  w1 ; 
new_b1  =  b1 ; 
new_w2  =  w2; 
new_b2  =  b2; 


%  write  header  to  report  file 
report_file  =  fopen(report_filename,'at+'): 

fprintf(report_file,'%s'^',' - '); 

fprintf(reportJile,'%s%s%s\n\n','NNET  TRAINING  RECORD  Ver.  revision', ‘ 

(Epochs, SSEs,#Correct,#MoreThan3off)'); 

fprintf(report_file,'%s\n',remarks); 

fprintf(report_file,'%s\n',remarks2); 

fprintf(report_file,'win_width  =  %d,  max_offset  =  %d,  seed  =  %d\n',win_width,max_win_offset,seed); 
fprintf(report_file,'%s/%s:  Hidden  =  %d,  Num_Dips  =  %d  ,  me  =  %f ,  er  = 
'%>f\n\n’,f1,f2,hidden_size,num_dips,mc,er): 
tv  =  fix(clock); 

time_stamp  =  strcat(’Start  Time  =  ',num2str(tv(1))); 
time_stamp  =  strcat(time_stamp,'/’); 
time_stamp  =  strcat(time_stamp,num2str(tv(2))): 
time_stamp  =  strcat(time_stamp,‘/'); 
time_stamp  =  strcat(time_stamp,num2str(tv(3))); 
time_stamp  =  strcat(time_stamp,'  '); 
time_stamp  =  strcat(time_stamp,num2str(tv(4))); 
time_stamp  =  strcat(time_stamp,':'); 
time_stamp  =  strcat(time_stamp,num2str(tv(5))); 
time_stamp  =  strcat(time_stamp,‘:'); 
time_stamp  =  strcat(time_stamp,num2str(tv(6))); 
fprintf(report_file,'%s\n',time_stamp); 
fclose(report_file): 

if  (time_align  =  1)  &  (static_offsets  ==  1) 
load(offname): 
else 

train_offsets  =  zeros(num_dips,1 );  %  for  recording  recognition  window  offsets 

end 

test_offsets  =  zeros(1 ,1 ); 

if  singleJnstance_only  ==  1 
ref_dips  =  zeros(1,1); 
end 

pass_num  =  1; 


111 


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 
%  START  OF  EPOCH  LOOP 

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 


for  epoch_count  =  epoch_count:me  %  epoch  loop  (one  pass  through  training  list) 

%  for  2nd  &  subsequent  passes  through  training  files, 

%  generate  a  random  integer  for  each  training  file  to 
%  allow  shuffling  of  presentation  sequence  (see  comments 
%  below) 

if  pass_num  >  1 

rand_array  =  round(10*rand(1,length(train_offsets))): 
max_draws  =  10; 
else 

max_draws  =  0;  %  no  shuffling  on  1st  pass  through  files 
end 

tv  =  fix(clock): 

disp(strcat(num2str(epoch_count),' - [TRAINING] - ')); 

disp(tv); 

correct  =  0; 
wrong_by_alot  =  0; 

SSE  =  0; 

dip_record  =  zeros(num_dips,3):  %  for  each  dip  in  a  pass:  #  seen,  #  correct,  #  way  off 
num_found  =  0; 

matjistjile  =  fopen(mfpname); 

for  dip_set  =  0:max_draws  %  make  max_draws+1  passes  thru  files,  only  training  on 

%  a  file  when  rand_array(file_num)  ==  dip_set. 

%  This  results  in  shuffling  sequence  of  training 
%  file  presentations  each  epoch.  On  1st  epoch,  all 
%  files  are  presented  in  file  list  sequence  and  in 
%  a  single  pass  through  the  file  list. 

Iine_read  =  ’ 
dip_name  =  blanks(7); 
file_num  =  0; 


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 
%  training  file  loop 

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%o/o%%%%% 


while  (-feof(mat_list_file))  &  (length(line_read)>0)  %  for  every  MAT  file 
line_read  =  (fgetl(mat_list_file)): 
file_num  =  file_num  +  1; 

%  if  line  not  blank  &  this  file's  number  is  up  (or  1st  pass  thru  files) 
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if  (pass_num  >  1)  &  (length(line_read)  >  2) 
if  (rand_array(file_num)  ==  dip_set) 
file_Gnable  =  1 ; 
else 

file_enable  =  0; 
end 
else 

file_enable  =  1 ; 

if  length(train_offsets)  <  file_num 
train_offsets(file_num,1)  =  mid_pos: 
end 
end 

if  file_enable  ==  1 

if  (length(line_read)  >  2) 
load  nice.txt  -ascii; 
niceness=nice; 


%sleep  for  niceness  seconds  if  during  work  hours 

tv  =  fix(clock); 

if  (niceness  >  0)  &  (tv(4)  >  nice_start)  &  (tv(4)  <  nice_stop) 
pause(niceness); 
end 

filename  =  line_read;  %  used  to  load  file 


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 
%  pick  out  diphone  name  from  filename 

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 

dotJoc=findstr(filename,'.'); 
if  dotJooO 

dip_str=filename(1  :dot_loc-1 ); 
j  =  length(dip_str); 

while  (dip_str(j)  >=  '0')  &  (dip_str(j)  <=  '9') 

end 
i  —  ji 

while  (dip_str(i)  -= '/')  &  (dip_str(i)  -=  &  (i  >  1) 

i  =  i-1; 
end 
if  i  >  1 
i  =  i  +  1; 
end 

dip_name  =  blanks(7); 
dip_name(1:j-i+1)  =  upper(dip_str(i:j)); 
for  i  =  1  :iength(dip_name)  %  get  rid  of  hyphen 

if  dip_name(i)  == 
dip_name(i)  = ' 
break; 
end 
end 
else 

error(strcat('Bad  MAT  filename:  '.filename)); 
end 
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elseif  length(line_read)  <  3 
break;  %  bail  out  of  eof  loop  if  line  blank 

end 

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 
%  determine  target  output  vector 

%  note  that  dipnames  file  MUST  be  alpha  sorted  &  pound  ('#')  signs  replaced  by  'X' !!! 
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 


%  use  bin  search  to  find  dip's  column  number 
dip_num  =  dipnumof(dip_name,dip_names(1:num_dips,:)): 

if  strcmp(dip_names(dip_num,:),dip_name)  ~=  1 

error(strcat('ls  dip  list  sorted  &  are  #s  replaced  by  Xs?  Unable  to  find  ',dip_name)); 
end 

%  target  vector  is  all  zeroes  except  a  1  at  correct  dip  col 
t  =  zeros(num_dips,1): 
t(dip_num)  =  1; 


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 
%  if  training  on  full  set  or  this  is  first  example  of  this  diphone,  load  MAT  file 
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 


if  pass_num  >  1 
if  ref_dips(file_num,1)  >  0 
ref_dip=  1; 
else 

ref_dip  =  0; 
end 
else 

if  dip_record(dip_num,1)  <  1 
ref_dip  =  1; 

ref_dips(file_num,1)  =  1; 
else 

ref_dip  =  0; 

ref_dips(file_num,1)  =  0; 
end 
end 


if  (singleJnstance_only  ==  0)  I  (ref_dip  ==  1) 

%  load  mat  file 
load(filename,'-mat'): 

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 
%  if  this  is  first  pass  or  (or  every  fifth  pass)  thru  the  files... 
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 


if  (pass_num  ==  1)  I  (round(pass_num/5)  ==  (pass_num/5)) 

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 
%  ...  and  not  time  aligning 

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 
if  (time_align  ==  0) 
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train_offsets(file_num,1 )  =  mid_pos:  %  center  recognition  window  per  timit  transcription 
win_pos  =  mid_pos; 
besLpos  =  mid_pos: 
win_pos  =  mid_pos; 


%  sequence  each  column  from  matrix  into  vector 
p  =  vectoriz(feature_surf_a,win_width,win_pos): 

ste=p(ste_start:input_size):  %  this  is  patch  to  convert  STPwr  to  RMS  Amplitude 
p(ste_start:input_size)=sign(ste).*sqrt(abs(ste)); 


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 
%  ...elseif  applicable,  use  window  offset  from  static  offsets  file 

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 
elseif  static_offsets  ==  1 

win_pos  =  train_offsets(file_num,  1 );  %  center  recognition  window  per  offsets  file 
best_pos  =  win_pos; 


%  sequence  each  column  from  matrix  into  vector 

if  win_pos  <  (win_slack+1) 
p  =  vectoriz(feature_surf_a,win_width,win_pos): 

ste=p(ste_start:input_size);  %  this  is  patch  to  convert  STPwr  to  RMS  Amplitude 
p(ste_start:input_size)=sign(ste).*sqrt(abs{ste)); 
else 

win_pos  =  win_pos  -  win_slack; 
p  =  vectoriz(feature_surf_b,win_width,win_pos); 

ste=p(ste_start:input_size):  %  this  is  patch  to  convert  STPwr  to  RMS  Amplitude 
p(ste_start:input_size)=sign(ste).*sqrt(abs(ste)): 
end 


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 
%  ...else  use  NNET  feedback  to  time  align  vector. 

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%o/o%o/o%%%%% 


else 


best_out  =  -inf; 
best_pos  =  0; 
best_rank  =  num_dips: 


%  look  for  best  window  pos  in  vector  a 
for  win_pos  =  win_scan_start:win_scan_stop 

%  sequence  each  column  from  matrix  into  vector 
p  =  vectoriz(feature_surf_a,win_width,win_pos); 

ste=p(ste_start:input_size):  %  this  is  patch  to  convert  STPwr  to  RMS  Amplitude 
p(ste_start:input_size)=sign(ste).*sqrt(abs(ste)); 

if  batch  ==  1 
a1  =  feval(f  1  ,w1  *p,b1 ); 
a2  =  feval(f2,w2*a1  ,b2); 
else 

a1  =  feval(f1  ,new_w1  *p,new_b1 ); 
a2  =  feval(f2,new_w2*a1  ,new_b2); 
end 


115 


a2_copy  =  a2; 

[max_val  max_pos]  =  max{a2_copy); 

a2_copy(max_pos)  =  0; 
test_wrong_by  =  0; 
sum_winners  =  0; 
while  max_pos  ~=  dip_num 
sum_winners  =  sum_winners  +  a2_copy(max_pos): 

[max_val  max_pos]  =  max(a2_copy); 
a2_copy(max_pos)  =  0; 
test_wrong_by  =  test_wrong_by  +1 ; 
end 

if  (test_wrong_by  <  best_rank) 
best_rank  =  test_wrong_by; 
best_out  =  a2{dip_num)  -  sum_winners: 
best_pos  =  win_pos; 
best_p  =  p: 

elseif  {test_wrong_by  ==  best_rank)&((a2(dip_num)  -  sum_winners)  >  best_out) 
best_out  =  a2(dip_num)  -  sum_winners: 
best_pos  =  win_pos; 
best_p  =  p; 
end 

end 


%  look  for  best  window  pos  in  vector  b 
for  win_pos  =  win_scan_start:win_scan_stop 

%  sequence  each  column  from  matrix  into  vector 
p  =  vectoriz(feature_surf_b,win_width,winj30s): 

ste=p{ste_start:input_size);  %  this  is  patch  to  convert  STPwr  to  RMS  Amplitude 
p(ste_start:input_size)=sign(ste).*sqrt(abs(ste)); 

if  batch  ==  1 
a1  =  feval(f1  ,w1  *p,b1 ); 
a2  =  feval(f2,w2*a1  ,b2); 
else 

a1  =  feval(f1  ,new_w1  *p,new_b1 ): 
a2  =  feval(f2,new_w2*a1,new_b2); 
end 


a2_copy  =  a2; 

[max_val  max_pos]  =  max(a2_copy): 

a2_copy(max_pos)  =  0; 
test_wrong_by  =  0; 
sum_winners  =  0; 
while  max_pos  ~=  dip_num 
sum_winners  =  sum_winners  +  a2_copy(max_pos); 
[max_val  max_pos]  =  max(a2_copy); 
a2_copy{max_pos)  =  0; 
test_wrong_by  =  test_wrong_by  +1 ; 
end 
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if  (test_wrong_by  <  best_rank) 
best_rank  =  test_wrong_by: 
best_out  =  a2(dip_num)  -  sum_winners: 
best_pos  =  win_pos  +  win.slack; 
best_p  =  p: 

elseif  (test_wrong_by  =  best_rank)&((a2(dip_num)  -  sum_winners)  >  best_out) 
best_out  =  a2(dip_num)  -  sum_winners: 
best_pos  =  win_pos  +  win.slack; 
best_p  =  p; 
end 

end 

train_offsets(file_num,1 )  =  best_pos:  %  remember  this  window  pos  for  next  epoch 

p  =  best_p: 


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%o/o%%%%%%%%%% 

%  ...eise  load  correct  window  offset  from  array 

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%o/o%%%%%%%%%%%%% 

else 

win_pos  =  train_offsets(file_num); 
best_pos  =  winjjos; 
if  win_pos  <  (win_slack+1) 

%  sequence  each  column  from  matrix  into  vector 
p  =  vectoriz(feature_surf_a,win_width,win_pos); 

ste=p(ste_start:input_size):  %  this  is  patch  to  convert  STPwr  to  RMS  Amplitude 
p(ste_start:input_size)=sign(ste).*sqrt(abs(ste)); 

else 

win_pos  =  win_pos  -  win_slack; 

%  sequence  each  column  from  matrix  into  vector 
p  =  vectoriz(feature_surf_b,win_width,win_pos); 

ste=p(ste_start:input_size):  %  this  is  patch  to  convert  STPwr  to  RMS  Amplitude 
p(ste_start:input_size)=sign(ste).*sqrt(abs(ste)); 

end 

end 


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%o/o%%%%%% 

%  Determine  outputs  of  layers 

%%%%%%%%%%%%%%%%%%%%%%%%%%o/o%'>/o%%%<>/o%%%«>/o%%%'>/o'»/o%%%%%% 

if  batch  ==  1 

a1  =fevai(f1,w1*p,b1); 
a2  =  fevai(f2,w2*a1  ,b2): 
else 

a1  =  feval(f  1  ,new_w1  •p,new_b1 ); 
a2  =  feval(f2,new_w2*a1  ,new_b2); 
end 

e  =  t  -  a2: 
str  =  ■ 
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%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 
%  record  whether  NNET  output  was  correct  or  way  off 
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 


[max_val  max_pos]  =  max(a2): 
if  (dip_record(dip_num,1)  <  1) 
numjound  =  numjound  +  1 ; 
end 

dip_record(dip_num,1)  =  dip_record(dip_num,1)  +  1; 
if  max_pos  ~=  dip_num 
a2_copy  =  a2; 
a2_copy(max_pos)  =  0; 
wrong_by  =  0; 
while  max_pos  ~=  dip_num 
[max_val  maxjjos]  =  max(a2_copy); 
a2_copy(max_4Jos)  =  0; 
wrong_by  =  wrong_by  +1 ; 
end 

str  =  strcat(strcat(strcat(str,'  (wrong  by  ■),num2str(wrong_by)),‘)'): 
if  wrong_by  >  2 

wrong_by_alot  =  wrong_by_alot+1 ; 
dip_record(dip_num,3)  =  dip_record(dip_num,3)  +  1 ; 
end 
else 

correct  =  correct+1 ; 

dip_record(dip_num,2)  =  dip_record(dip_num,2)  +  1 ; 
end 


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 
%  BACKPROPAGATION  -  Call  approp  delta  function 
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 


d2  =  feval(df2,a2,e): 
if  batch  ==  1 

d1  =feval(df1,a1,d2,w2); 
else 

d1  =  feval(df1  ,a1  ,d2,new_w2): 
end 

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 
%  LEARNING  -  Use  backprop  to  find  weight  changes 
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 


[dw1  ,db1]  =  Iearnbpm(p,d1  ,lr,MC,dw1  ,db1): 

[dw2,db2i  =  Iearnbpm(a1  ,d2,lr,MC,dw2,db2); 

new_w1  =  new_w1  +  dw1 ;  %  adjust  temp  wts  &  bias's 

new_b1  =  new_b1  +  db1 ; 
new_w2  =  new_w2  +  dw2; 
new_b2  =  new_b2  +  db2; 


%  Determine  effect  of  new  weights  on  SSE 

if  batch  ==  1 

a1  =  feval(f1  ,new_w1  *p,new_b1 ); 
a2  =  feval(f2,new_w2*a1,new_b2); 
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e  =  t  -  a2: 
end 

SSE  =  SSE  +  sumsqr(e): 
end  %  end  of  ‘if  single  instance..." 

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 
%  break  out  of  file  loop  if  one  of  each  dip  found  &  only  one  needed 
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 


if  (num_found  ==  num_dips)  &  (single_instance_only  ==  1) 
if  (pass_num  >  1) 
break; 
end 
end 

end 


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 
%  end  of  training  set  "while  not(eoO“  loop 

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 

end 

frewind(matjist_file); 

end  %  end  of  file  shuffling  loop 
fclose(matJist_file); 

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 
%%%%%%%%%%%%%%%%%%  TEST  FILE  CODE  %%%%%%%%%%%%%%%%%% 
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 


MC  =  me; 


if  (singleJnstance_only  =  0)  &  {sep_test_data  ==  1) 

disp(strcat(num2str(correct),strcatC  correct  and  ',strcat(num2str(wrong_by_alot),'  way  off')))); 

disp{strcat(num2str(epoch_count),' - [TESTING] - ')); 

disp(fix(clock)); 

test_file  =  fopen(tfpname); 

line_read  = ' '; 
dip_name  =  blanks(7); 
test_file_num  =  0; 
test_SSE  =  0; 
test_correct  =  0; 
test_wrong_by_alot  =  0; 

test_dip_record  =  zeros(num_dips,3);  %  for  each  dip  in  last  pass:  #  seen,  #  correct,  #  way  off 
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%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 
%  test  file  loop 

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 

while  (-feof(test_flle))  &  (length(line_read)>0)  %  for  every  test  file 
line_read  =  (fgetl(test_file)); 
if  length(line_read)  >  2 

load  nice.txt  -ascii; 
niceness=nice; 


%sleep  for  niceness  seconds  if  during  work  hours 

tv  =  fix(clock); 

if  (niceness  >  0)  &  {tv(4)  >  nice_start)  &  (tv(4)  <  nice_stop) 
pause(niceness); 
end 

filename  =  line_read;  %  used  to  load  file 
test_file_num  =  test_file_num  + 1; 


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 
%  pick  out  diphone  namefrom  filename 

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 


dotJoc=findstr(filename,'.'); 
if  dotJooO 

dip_str=filename(1  :dot_loc-1 ); 
j  =  length(dip_str); 

while  (dip_str(j)  >=  '0')  &  (dip_str(j)  <=  '9') 
end 

i  =  j: 

while  (dip_str(i)  ~= '/’)  &  (dip_str(i)  -=  Y)  &  (i  >  1) 
i  =  i-1; 
end 
if  i  >  1 
i  =  i  +  1; 
end 

dip_name  =  blanks(7); 
dip_name(1:j-i+1)  =  upper(dip_str(i:j)); 
for  i  =  1  :length(dip_name)  %  get  rid  of  hyphen 

if  dip_name(i)  == 
dip_name(i)  = ' 
break; 
end 
end 
else 

error(strcat('Bad  TEST  filename:  '.filename)); 
end 

else 

break;  %  bail  out  of  eof  loop  if  line  blank 

end 
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%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 
%  determine  target  output  vector 

%  note  that  dipnames  file  MUST  be  alpha  sorted  &  pound  ('#')  signs  replaced  by  'X' !!! 
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 


%  use  bin  search  to  find  dip's  column  number 
dip_num  =  dipnumof(dip_name,dip_names(1;num_dips,;)); 

if  strcmp(dip_names(dip_num,:),dip_name)  -=  1 

error(strcat('ls  dip  list  sorted  &  are  #s  replaced  by  Xs?  Unable  to  find  ',dip_name)); 
end 

%  target  vector  is  all  zeroes  except  a  1  at  correct  dip  col 
t  =  zeros(num_dips,1); 
t(dip_num)  =  1 ; 


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 
%  load  test  MAT  file 

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 


load(filename,  '-mat'); 


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 
%  use  NNET  feedback  to  time  align  input  vectors 

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 


best_out  =  -inf; 
best_pos  =  0; 
best_rank  =  num_dips; 

%  look  for  best  window  pos  in  vector  a 
for  win_pos  =  win_scan_start:win_scan_stop 


%  sequence  each  column  from  matrix  into  vector 
p  =  vectoriz(feature_surf_a,win_width,win_pos); 

ste=p(ste_start:input_size);  %  this  is  patch  to  convert  STPwr  to  RMS  Amplitude 
p(ste_start:input_size)=sign(ste).*sqrt(abs(ste)): 

at  =  feval(f  1  ,w1  *p,b1 ); 
a2  =  feval(f2,w2*a1,b2); 

a2_copy  =  a2; 

[max_val  max_pos]  =  max(a2_copy): 

a2_copy(max_pos)  =  0; 
test_wrong_by  =  0; 
sum_winners  =  0; 
while  max_pos  ~=  dip_num 
sum_winners  =  sum_winners  +  a2_copy(max_pos): 

[max_val  max_pos]  =  max(a2_copy); 
a2_copy(max_pos)  =  0; 
test_wrong_by  =  test_wrong_by  +1 ; 
end 

if  (test_wrong_by  <  best_rank) 
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best_rank  =  test_wrong_by: 
best_out  =  a2(dip_num)  -  sum_winners; 
best_pos  =  win_pos; 
best_p  =  p: 

elseif  (test_wrong_by  ==  best_rank)&((a2(dip_num)  -  sum_winners)  >  best_out) 
best_out  =  a2(dip_num)  -  sum.winners; 
best_pos  =  win_pos: 
best_p  =  p; 

end 


%  look  for  best  window  pos  in  vector  b 
for  win_pos  =  win_scan_start:win_scan_stop 


%  sequence  each  column  from  matrix  into  vector 
p  =  vectoriz(feature_surf_b,win_width,win_pos): 

ste=p{ste_start:input_size);  %  this  is  patch  to  convert  STPwr  to  RMS  Amplitude 
p(ste_start:input_size)=sign(ste).*sqrl(abs(ste)): 

a1  =feval(f1,w1*p,b1); 
a2  =  feval(f2,w2*a1  ,b2); 


a2_copy  =  a2; 

[max_val  maxjaos]  =  max(a2_copy); 

a2_copy(max_pos)  =  0; 
test_wrong_by  =  0; 
sum_winners  =  0; 
while  max_pos  ~=  dip_num 
sum_winners  =  sum_winners  +  a2_copy(max_pos): 

[max_val  max_pos]  =  max(a2_copy): 
a2_copy(max_pos)  =  0; 
test_wrong_by  =  test_wrong_by  +1 ; 
end 

if  (test_wrong_by  <  best_rank) 
best_rank  =  test_wrong_by; 
best_out  =  a2(dip_num)  -  sum_winners; 
best_pos  =  win_pos  +  win_slack: 
best_p  =  p; 

elseif  (test_wrong_by  ==  best_rank)&((a2(dip_num)  -  sum_winners)  >  best_out) 
best_out  =  a2(dip_num)  -  sum_winners: 
best_pos  =  win_pos  +  win_slack: 
best_p  =  p; 
end 

end 

test_offsets(test_file_num,1 )  =  best_pos;  %  save  window  ofset  for  next  epoch 

p  =  best_p: 


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 
%  Determine  outputs  of  layers 

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 
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a1  =feval(f1,wrp,b1): 
a2  =  feval(f2,w2*a1  ,b2); 
e  =  t  -  a2; 

test_SSE  =  test_SSE  +  sumsqr(e); 
str  = ' 

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 
%  record  whether  NNET  output  was  correct  or  way  off 
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 


[max_val  maxjjos]  =  max(a2): 

test_dip_record(dip_num,1)  =  test_dip_record(dip_num,1)  + 1; 
if  max_pos  ~=  dip_num 
a2_copy  =  a2: 
a2_copy(max_pos)  =  0; 
test_wrong_by  =  0; 
while  max_pos  ~=  dip_num 
[max_val  max_pos]  =  max(a2_copy): 
a2_copy(max_pos)  =  0; 
test_wrong_by  =  test_wrong_by  +1 ; 
end 

str  =  strcat(strcat(strcat(str,'  (wrong  by  ').num2str(test_wrong_by)),')'): 
if  test_wrong_by  >  2 

test_wrong_by_alot  =  test_wrong_by_alot+1 ; 
test_dip_record(dip_num,3)  =  test_dip_record(dip_num,3)  +  1; 
end 
else 

test_correct  =  test_correct+1 ; 

test_dip_record(dip_num,2)  =  test_dip_record(dip_num,2)  +  1; 
end 


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 
%  end  of  while  not  eof  test_file 

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%o/o%%%%o/o%%%%%%% 

fclose(test_file); 

end  %  end  of  "if  sep_test_data..." 

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 
%  store  SSE(s)  in  array(s) 

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 

if  (singleJnstance_only=1) 

SSE_record(epoch_count)  =  SSE/num_dips: 
else 

SSE_record(epoch_count)  =  SSE/file_num; 
end 


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 
%  display  epoch  progress 

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%o/o%%o/o%o/o%%%%%%<>/o%%%%o/o 
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disp(strcat('*  SSE  went  from  ',num2str(last_SSE))); 
disp(strcat('*  to  ',num2str(SSE))): 
disp(strcat(’*  change  was  ',num2str(SSE-last_SSE))): 

disp(strcat(num2str(correct),strcat('  tmg  vects  OK  &  ',strcat(num2str(wrong_by_alot),‘  way  off')))); 
if  (sep_test_data  ==  1) 

t_SSE_record(epoch_count)  =  test_SSE/test_file_num: 

disp(strcat(num2str(test_correct),strcat('  test  vectors  OK  &  ',strcat(num2str(test_wrong_by_alot),'  way 
off')))): 
end 


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 
%  save  weights  if  this  is  best  result  so  far 

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 


if  (singleJnstance_only==0) 

if  sep_test_data  ==  0 
test_correct  =  correct; 
test_wrong_by_alot  =  wrong_by_alot; 
end 

if  (test_wrong_by_alot  <  best_wayoff)l((test_wrong_by_alot  ==  best_wayoff)  &  (test_correct  > 
best_correct)) 

efname  =  'bestwts'; 

eval(['save '  efname  ]);  %  save  best  wts 
best_wayoff  =  test_wrong_by_alot; 
best_correct  =  test_correct; 
end 

if  test_SSE  <  best_SSE 
efname  =  'bestsse'; 

eval(['save '  efname  ]);  %  save  best  sse  wts 
best_SSE  =  test_SSE; 
end 

end 


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 
%  momentum  &  adaptive  learning  rate  phases 

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 

if  SSE  >  last_SSE  *  er 

lr=lr*dm; 

MC  =  0; 

new_w1=w1; 
new_b1  =  b1 ; 
new_w2  =  w2; 
new_b2  ~b2; 

dispC*  Error  increased.  Rejecting  weight  changes  &  decreasing  learning  rate.'); 
str  =  '***'; 


else 
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if  SSE  <  last_SSE 
Ir  =  Ir  *  im; 
end 

w1  =  new_w1 ; 
b1  =  new_b1: 
w2  =  new_w2; 
b2  =  new_b2: 
last_SSE  =  SSE: 


end 


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 
%  record  epoch  stats  in  report  file 

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 


reportjile  =  fopen(report_filename,'at+'): 

fprintf(report_file,'%d,  TRAIN  ,%f,%d,%d,',epoch_count, SSE, correct, wrong_by_alot): 
if  sep_test_data==  1 

fprintf(report_file,'  TEST,%f,%d,%d,  %s\n',test_SSE,test_correct,test_wrong_by_alot,str): 
else 

fprintf(report_file,'  %s\n',str); 
end 

fclose(report_file); 


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 
%  break  out  of  for  loop  if  error  goal  reached 

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 


if  (singleJnstance_only  =0)  &  (SSE  <  (eg  *  file_num)) 
break; 

elseif  SSE  <  (eg  *  num_dips) 
break; 
end 


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 
%  increment  pass  counter  &  prevent  flop  counter  overflow 

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%'%,o/o% 

pass_num  =  pass_num  +  1 ; 

flop_count  =  flops;  %  prevent  flop  counter  overflow 
if  fiop_count  >=  1e9 

gigaflops  =  gigaflops  +  (flop_count/(1e9)): 
flops(O); 
end 

save  lastwts; 
end 

%  bottom  of  epoch  loop 

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%o/oO/o%o/o%%%o/o'>/o%%%o/o%%% 
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dispc - '); 

save  finalwts;  %  save  final  state  (including  final  wts  &  bias') 


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 
%  Display  final  results 

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 


if  (singleJnstance_only  ==0)  &  (SSE  <  (eg  *  file_num)) 
str  =  'Error  goal  reached!  Final  SSE:'; 
elseif  (single_instance_only  ==1)  &  (SSE  <  (eg  *  num_dips)) 
str  =  'Error  goal  reached!  Final  SSE:'; 
else 

str  =  'Error  goal  not  reached!  Final  SSE:'; 
end 

disp(str); 
disp(SSE); 
if  gigaflops  >  0 

gigaflops  =  gigaflops  +  (flop_count/(1e9)); 
flops(O); 

disp('Giga-Flops:'); 

disp(gigaflops); 

else 

dispCFIops:'); 

disp(flops); 

end 


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 
%  write  trailer  to  report  file 

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 


report_file  =  fopen(report_filename,'at+'); 
tv  =  fix(clock); 

time_stamp  =  strcat('Stop  Time  =  ',num2str(tv(1))); 
time_stamp  =  strcat(time_stamp,'/’); 
time_stamp  =  strcat(time_stamp,num2str(tv(2))); 
time_stamp  =  strcat(time_stamp,'/'); 
time_stamp  =  strcat(time_stamp,num2str(tv(3))); 
time_stamp  =  strcat(time_stamp,'  '); 
time_stamp  =  strcat(time_stamp,num2str(tv(4))); 
time_stamp  =  strcat(time_stamp,':'); 
time_stamp  =  strcat(time_stamp,num2str(tv(5))); 
time_stamp  =  strcat(time_stamp,':'); 
time_stamp  =  strcat(time_stamp,num2str(tv(6))); 
fprintf(report_file,'\n%s\n',time_stamp); 
if  gigaflops  >  0 

fprintf(report_file,'%fgiga-flops\n', gigaflops); 
else 

fprintf(report_file,'%dflops\n', flops); 
end 

fprintf(report_file,'%s  %fVn',str,SSE); 

fprintf(reportJile,'\n%s\n\n','END  OF  NNET  TRAINING  RECORD'); 
fclose(report_file); 
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C.  RECOGNIZ.M 


%  RECOGNIZ  -  Use  the  currently  loaded  weights  to  produce  a  text  file 
%  showing  diphone  detector  outputs  for  wav  files  listed  in  another 
%  text  file. 

% 

%  Sample  WAV  file  list  format: 

%  Aimit/lest/dr1/mcje0/sx33.wav  ocean  1234  3499 
%  ... 

%  Aimit/test/dr8/mtn(v0/sx343.wav  rabbit  21 234  23499 
% 

%  Sample  Output: 

%  (wordjs  ocean) 

%  ((1 008  (AE  P)  0.8)  (1 008  (R  OW)  0.6)  (1 008  (I Y  T)  0.4)) 

%  ((1010  (OW  K)  0.9)  (1010  (OW  B)  0.6)  (1010  (UH  JH)  0.4)) 

%  ... 

%  (word_end  ocean) 

% 

%  Written  by:  Major  Mark  Cantrell,  USMC 
%  Date:  20  Dec  1995 
% 

%  2  Jun  1996  -  Switched  to  2  interleaved  feature  surfaces  to  improve 

%  resolution 

%  7  Jun  1996  -  Switched  from  short-time  power  to  RMS  amplitude  in 

%  column  30  of  feature  surfaces 

% 


revision  =  '3.0'; 

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%o/o%% 

%  VARIABLES  &  PARAMETERS 

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 

speech_data  =  Q;  %  vector  with  entire  speech  file's  contents 
sampsize  =  Q;  %  size  of  above 

fname  =  Q;  %  will  be  string  with  filename 
pname  =  Q;  %  will  be  string  with  pathname 
fpname  =  [];  %  will  be  string  with  filename  +  pathname 
vamame  =  Q;  %  will  be  string  with  filename  less  extension 

win_step  =  512-170;  %  This  is  how  much  (samples)  each  recognition  window  moves 
%  each  time  step,  resolution  is  half  this  with  interleaving 

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 

%  get  wav/word  file  list  filename  from  user 

%%%%%%%%%%%%%%%%%%%%%%%%%%%«>/o%%%%0/oO/o%%0/o%%% 


[fname, pname]  =  uigetfile('*.*','WAV/Word  File  List  File'); 
if  fname==0 
return 
end 

fpname=strcat(pname,fname); 
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%%%%%%%%%%%%%%%%%%%%%%%%%%%% 

%  read  in  list  of  diphone  names 
%%%%%%%%%%%%%%%%%%%%%%%%%%%% 

num_dips  =  0; 
dip_names  =  zeros(575,7): 

dip_file  =  fopen('diplist.txt'); 
line_read  =  ' 

while  (~feof(dip_file))  &  (length(line_read)>2)  %  for  every  diphone  listed 
line_read  =  upper(fgetl(dip_file)); 
if  length(line_read)  >  2 
num_dips  =  num_dips  +  1; 

dip_names(num_dips,:)  =  blanks(length(dip_names(num_dips,:))); 
dip_names(num_dips,1  :length(line_read))  =  line_read; 
end 
end 

fclose(dip_file); 

dip_names  =  dip_names(1:num_dips,:): 


list_file  =  fopen(fpname); 
result_file  =  fopen(‘recogniz.out','wt‘): 
line_read  =  fgetl(list_file): 


%%%%%%%%%%%%%%%%%%%%%%%%%%%% 

%  main  list  file  loop 

%%%%%%%%%%%%%%%%%%%%%%%%%%%% 

while  (~feof(listJile))  &  (length(line_read)>0)  %  for  every  filename  listed  in  list  file 

dispC - '): 

disp(line_read): 


%%%%%%%%%%%%%%%%%%%%%%%%%%%% 


dotJoc=findstr(line_read,'.');  %  pick  out  pathname  from  line  read 

fpname_end=dotJoc+3: 

fpname  =  line_read(1:fpname_end): 

disp  (strcat('Pathname  is  ’.pname)); 


%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 


%  pick  out  word  name 
wordname_start  =  fpname_end  +  2; 
i  =  wordname_start; 
while  line_read(i)  ~= ' ' 
i=i+1; 
end 

wordname_end  =  i  - 1 ; 

wordname  =  line_read(wordname_start:wordname_end); 
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disp(strcat('Word  is  '.wordname)); 


%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 

%  get  location  of  this  diphone's  phoneme  junction 

wordjimits  =  sscanf(line_read(wordname_end+1  :length(line_read)),’%d  %d'); 
word_start  =  wordjimits(l); 
word_end  =  word_limits(2); 

disp(strcat('Word  start  is:  ',num2str(word_start))); 
disp(strcat('Word  end  is:  ',num2str(word_end))): 


%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 


disp(strcat('Loading  ’.fpname)); 

speech_data  =  Id16bit(fpname); 

speech_data=speech_data'; 

speech_data=speech_data(513:length(speech_data)); 

%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 
%  compute  2  feature  surfaces  (interleaved  in  time) 
%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 


[feature_surf,median_pitch,formants_a]  =  featurel(speech_data,word_start-1 70,word_end-1 70); 
feature_surf1  =  diff_mat(feature_surf): 

ste=feature_surf1(:,30):  %  convert  STP  to  RMS  amplitude 
feature_surf1(:,30)  =  sign(ste).*sqrt(abs(ste)): 

[feature_surf,median_pitch,formants_a]  =featurel(speech_data,word_start,word_end); 
feature_surf2  =  diff_mat(feature_surf); 

ste=feature_surf2(:,30);  %  convert  STP  to  RMS  amplitude 
feature_surf2(:,30)  =  sign(ste).*sqrt(abs(ste)): 


%%%%%%%%%%%%%%%%%%%%%%%%%% 
%  determine  network  outputs  using  current  weights 
%%%%%%%%%%%%%%%%%%%%%%%%%% 


[feature_rows  fc]  =  si2e(feature_surf1); 
win_slack  =  feature_rows  -  win_width  +  1 ; 
result_matrix  =  2eros(2*win_slack,num_dips); 
for  win jDos  =  1  :win_slack 

for  i  =  1 :30  %  sequence  each  column  from  matrix  1  into  vector 

p(i*win_width-win_width+1  :i*win_width,1 )  =  feature_surf1  (win_pos:win_pos+win_width-1  ,i); 
end 

a1  =feval(f1,w1*p,b1); 
a2  =  feval(f2,w2*a1  ,b2): 
result_matrix(2*win_pos-1 ,:)  =  a2'; 

for  i  =  1 :30  %  sequence  each  column  from  matrix  2  into  vector 
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p(i*win_width-win_width+1  :i*win_width,1 )  =  feature_surf2(win_pos:win  j30s+win_width-1  ,i); 
end 

a1  =feval(f1,w1*p,b1): 
a2  =  feval(f2,w2*a1  ,b2); 
result_matrix(2*win_pos,:)  =  a2'; 

end 


%%%%%%%%%%%%%%%%%%%%%%%%%%%% 
%  write  this  word's  outputs  to  the  file 
%%%%%%%%%%%%%%%%%%%%%%%%%%%% 


fprintf(result_file,‘(word_is  %s)\n',wordname); 

for  win_pos  =  1:2*win_slack 

outputs_copy  =  result_matrix(win_pos,:); 

[max_val  max_pos]  =  nnax(outputs_copy): 
out1  =  max_val; 
dipt  =  dip_names(max_pos,:); 
outputs_copy(max_pos)  =  0; 

[max_val  max_pos]  =  max(outputs_copy): 
out2  =  max_val: 
dip2  =  dip_names(max_pos,:); 
outputs_copy(max_pos)  =  0; 

[max_val  max_pos]  =  max(outputs_copy); 
out3  =  max_val: 
dip3  =  dip_names(max_pos,:); 
outputs_copy(max_pos)  =  0; 

fprintf(result_file,'((%d  (%s)  %f)  (%d  {%s)  %f)  (%d  (%s)  %f))\n', 

in_pos*win_step/2,dip1  ,out1  ,win_pos*win_step/2,dip2,out2,win_pos*win_step/2,dip3,out3); 


end 

fprintf(result_file,’(word_end  %s)\n',wordname); 

line_read  =  fgetl(list_file); 

end  %  end  of  while  not  EOF(file  list  file)  loop 
%%%%%%%%%%%%%%%%%%%%%%%%%%%% 


fclose(result_file); 

fclose(list_file); 
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D.  FEATURES.M 


function  [out_matrix,out_pitch,form_matrix]  =  features(speech_data,dip_boundry) 
%FEATURES  Return  the  matrix  form  of  a  feature  vector  for  a  given 
%  speech  slice.  The  recognition  window  will  be  centered 

%  on  the  sample  number  specified  in  the  second  argument. 

%  Matrix  will  include  an  extra  two  time  periods  on  either 

%  side  of  the  24  needed  for  producing  the  final  vector. 

%  The  second  value  returned  is  median  pitch  for  sample. 

% 

%  Note:  This  version  currently  returns  a  measure  of  short-time 

%  power  in  column  30  of  the  feature  surface.  As  explained  in 

%  the  thesis,  RMS  amplitude  would  have  been  better.  TRNBPX  has 

%  been  patched  to  use  the  latter.  But  it  would  be  more  efficient 

%  in  future  if  RMS  amplitude  used  in  original  feature  surfaces. 

% 

%USAGE  EXAMPLE:  [feature_matrix  pitch]  =  features(data(100:12000),5000): 
% 

%  Revision  2.0;  Written  by  Major  M.  E.  Cantrell,  USMC 

% 

%  7  June  1996  -  Patched  to  use  RMS  amplitude  vice  STP 


fs  =  16000; 
ps  =  l/fs; 
n  =  512: 

win_overlap  =  170; 
L=round(n/10); 


%  sample  rate 
%  sample  period 
%  window  size  for  FFT 

%  how  much  sliding  window  overlaps  last  window 
%  initial  value  for  the  "low  time  lifter"  used  with  cepstrum 


speech_slice=zeros(1 ,9746); 
slice_size=9746; 

speech_start  =  dip_boundry  -  4446; 
speech_end  =  dip_boundry  +  5299; 
slice_start=  1; 
slice_end  =  9746; 

speechjength  =  length(speech_data); 


%  will  hold  the  slice  of  main  file  in 
%  in  recognition  window 

%  center  window 

%  start  of  14th  window  pos  is  @4447 


%  following  vectors  hold  start  &  stop  freqs  for  freq  bins. 

%  these  will  be  adapted  for  speaker's  pitch  &  then  used  to 
%  determine  which  formant  surface  columns  to  average  for  each 
%  freq  bin  in  output  matrix. 

bin_starts  =  [ 

0 

74.4 

149.4 

225.3 

302.7 

382.2 

464.5 

550.4 

640.6 

735.9 

837.3 

945.8 

1062.6 
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1189.1 

1326.8 

1477.6 

1643.6 

1827.2 

2031.5 
2260.0 

2516.9 

2807.4 

3137.7 

3515.5 
3950.0 

4453.1 

5039.5 

5727.5 
6541.0 
]; 


bin_encls  =  [ 
100 

175.6 

252.3 
330.8 

411.7 
495.6 

583.4 

675.8 

773.8 

878.2 

990.3 

1111.2 

1242.5 

1385.8 

1543.1 

1716.6 

1909.2 

2124.1 
2365.0 

2636.8 

2945.1 

3296.7 

3700.3 
4166.5 

4708.4 

5342.9 

6091.3 
6981.0 

8047.8 

]: 


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 
%  pad  voice  data  with  zeros  if  needed 

if  speech_start  <  1 
slice_start  =  2  -  speech_start; 
speech_start  =  1 ; 
end 
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if  speech_end  >  speech_length 
slice_end  =  4447  +  (speechjength  -  dip_boundry); 
speech_end  =  speechjength; 
end 

speech_slice(slice_start:slice_end)  =  speech_data(speech_start:speech_end); 
%pIot(speech_slice); 

%drawnow;z=input('Press  enter  to  continue...'); 


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 
%  build  matrices  that  will  hold  cepstral  &  short  time  energy  data 

form_cols  =  n/2;  %  at  fs=1 6K  &  n=51 2  this  covers  0-8K  Hz 

form_rows  =  28; 

form_matrix  =  zeros(form_rows,form_cols); 
out_matrix  =  zeros(form_rows,30); 
ste_array  =  zeros(form_rows,1); 


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 


hamming_vector=zeros(1  ,n);  %  compute  window  &  "lifter"  function  for  size=n 

hamming_vector=hamming(n)'; 

%compute  "lifter's"  coefficients  for  cepstrum/formants 
lifter1=zeros(1,n); 
last=n; 
for  i  =  0:L 

lifter!  (i+1)  =  1+cos(pi*i/(2*L)); 
lifter!  (last-i)  =  lifter!  (i+!); 
end 


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 
%  Main  "program"  loop  starts  here 

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 

win_start  = ! ;  %  start  sliding  window  at  first  sample 

win_end  =  n; 

pitch_array  =  zeros(form_rows,!);  %  for  computing  median  of  pitch  values 
pitch_count  =  0; 
form_x  = ! ; 

win_size  =  win_end  -  win_start  +  ! ; 

while  (form_x  <=  form_rows)  %  main  processing  loop 

ste  =  (speech_slice(!,win_start:win_end)  .*  speech_slice(!,win_start:win_end))/!  00000; 
ste_array(form_x,!)  =  mean(ste(!,:)); 

if  max(abs(speech_slice(!,win_start:win_end)))  >  0 

%  compute  FFT 
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abs_fft=abs(fft(hamming_vector.*speech_slice(1  ,win_start:win_end))); 
%  now  compute  “cepstrum"; 


RCEP=ifft(log(abs_fft)); 
rcep=abs(RCEP(1  ;win_size/2)): 


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 

%  determine  pitch 


[peak_val,peak_loc]  =  max(rcep(L:length(rcep))); 
pitch_prominence  =  peak_val/(10*median(rcep(L:length(rcep)))); 
cep_pitch=fs/{L+peakJoc); 

if  (pitch_prominence  >=  1 .5)  %  1.5  is  probably  a  safer  number  (but  slow!) 
pitch=cep_pitch; 
pitch_count  =  pitch_count  +  1 ; 
pitch_array(pitch_count)=pitch; 
elseif  ((pitch_prominence  >=  0.6)  &  (pitch_count  <  1)) 

%  if  two  pitch  estimates  are  within  10  %  of  each  other,  use  cep_pitch 
a_pitch=p_amdf(speech_slice(1,win_start:win_end),fs,n-(1+((1/50)/(1/fs)))): 
if  ((((abs(cep_pitch-a_pitch))/cep_pitch)  <  0.10))  I  ((cepjaitch  >=  100)  &  (cep_pitch  <=  200)  & 
(pitch_count  <  1)) 

pitch=cep_pitch; 
pitch_count  =  pitch_count  +  1 ; 
pitch_array(pitch_count)=pitch; 
end 
else 
pitch=0; 
end 


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 


%  smooth  formants 


RCEPL1  =zeros(1,length(RCEP)); 

%''lifter"  out  pitch  &  rhamonics 

RCEPL1(1:length(RCEP))  =  RCEP(1:length(RCEP))  .*  lifter1(1:length(RCEP)): 

FORMANTS  =  fft(RCEPL1 );  %now  take  FFT  to  see  formants 
form_matrix(form_x,:)  =  abs(FORMANTS(1:n/2)): 

%disp('Pitch_Prom:'): 

%disp{pitch_prominence): 

%dispCPitch:‘): 

%disp(pitch); 

%plot(abs(FORMANTS(1:n/2))),title('FORMANTS'): 

%drawnow;z=input('Press  enter  to  continue...’); 
%plot(rcep(L:length(rcep))),title('CEPSTRUM'); 

%drawnow;z=input('Press  enter  to  continue...'); 

end  %  end  of  "if  non-zero  amplitude" 


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 
%  slide  window  to  next  (overlapped)  position 


form_x  =  form_x  +  1 ; 
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win_start  =  win_end  + 1  -  win_ovei1ap; 
win_end  =  win_start  +  n  - 1 ; 

end  %  while 

%mesh{form_matrix): 

%drawnow;z=input(‘Press  enter  to  continue...'); 

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 
%  main  “program"  loop  ends  here 

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 


%  now  “normalize"  data  for  speaker's  pitch 
if  pitch_count  >  0 

pitch  =  median(pitch_array(1:pitch_count)): 
if  (pitch  <  50)  I  (pitch  >  400) 
pitch  =  100; 
out_pitch  =  0; 
else 

out_pitch  =  pitch; 
end 
else 

pitch  =  100; 
out_pitch  =  0; 
end 

bin_starts  =  bin_starts  /  (1 .1 85  -  (0.001 85  *  pitch)); 
bin_ends  =  bin_ends  /  (1 .185  -  (0.00185  *  pitch)); 

fori  =  1:29 

bin_start  =  round(n*bin_starts(i)/fs); 
if  bin_start  <  1 
bin_start  =  1 ; 
end 

if  bin_start  >  n/2 
bin_start  =  n/2; 
end 

bin_end  =  round(n*bin_ends(i)/fs); 
if  bin_end  <  1 
bin_end  =  1 ; 
end 

if  bin_end  >  n/2 
bin_end  =  n/2; 
end 

for  j  =  1  :form_rows 

out_matrix(j,i)  =  mean(form_matrix(j,bin_start:bin_end)); 
end 
end 


out_matrix(:,30)  =  ste_array(:,1); 
%mesh(out_matrix); 

%drawnow;z=input(’Press  enter  to  continue...'); 
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E.  FEATUREL.M 


function  [out_matrix,out_pitch,form_matrix]  =  featurel(speech_data, start, end) 
%FEATUREL  Return  the  matrix  form  of  a  feature  vector  for  a  given 
%  speech  slice. 

%  The  second  value  returned  is  median  pitch  for  sample. 

% 

%  Note:  This  version  currently  returns  a  measure  of  short-time 

%  power  in  column  30  of  the  feature  surface.  As  explained  in 

%  the  thesis,  RMS  amplitude  would  have  been  better.  RECOGNIZ  has 

%  been  patched  to  use  the  latter.  But  it  would  be  more  efficient 

%  in  future  if  RMS  amplitude  used  in  original  feature  surfaces. 

% 

%USAGE  EXAMPLE:  [feature_matrix  pitch]  =  featurel(data(100:12000)): 

% 

%  Revision  2.0;  Written  by  Major  M.  E.  Cantrell,  USMC 
% 

%  7  June  1996  -  Patched  to  use  RMS  amplitude  vice  STP 


fs  =  16000;  %  sample  rate 

ps  =  1/fs;  %  sample  period 

n  =  512;  %  window  size  for  FFT 

win_overlap  =  170;  %  how  much  sliding  window  overlaps  last  window 

L=round(n/10);  %  initial  value  for  the  "low  time  lifter"  used  with  cepstrum 

%  center  window 

speech_start  =  start  -  4446;  %  start  of  14th  window  pos  is  @4447 

speech_end  =  end  +  5299; 

speech_slice=zeros(1  ,speech_end-speech_start+1 ) ; 

slice_start  =  1 ; 

slice_end  =  length(speech_slice); 
slice_size  =  slice_end; 
speech_length  =  length(speech_data); 


%  pad  voice  data  with  zeros  if  needed 

if  speech_start  <  1 
slice_start  =  2  -  speech_start; 
speech_start  =  1 ; 
end 

if  speech_end  >  speech_length 
speech_end  =  speechjength; 
siice_end  =  slice_start  +  (speech_end  -  speech_start); 

%  slice_end  =  4447  +  (speechjength  -  end); 

%  speech_end  =  speechjength; 
end 

speech_slice(siice_start:slice_end)  =  speech_data(speech_start:speech_end); 

%plot(speech_slice) ; 

%drawnow; 
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%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 
%  following  vectors  hold  start  &  stop  freqs  for  freq  bins. 

%  these  will  be  adapted  for  speaker's  pitch  &  then  used  to 
%  determine  which  formant  surface  columns  to  average  for  each 
%  freq  bin  in  output  matrix. 
bin_starts  =  [ 

0 

74.4 

149.4 

225.3 

302.7 

382.2 

464.5 

550.4 

640.6 
735.9 

837.3 

945.8 

1062.6 

1189.1 

1326.8 

1477.6 

1643.6 

1827.2 
2031.5 
2260.0 

2516.9 

2807.4 

3137.7 

3515.5 
3950.0 

4453.1 

5039.5 

5727.5 
6541.0 
]; 


bin_ends  =  [ 
100 

175.6 

252.3 
330.8 

411.7 
495.6 

583.4 

675.8 

773.8 

878.2 

990.3 

1111.2 

1242.5 

1385.8 

1543.1 

1716.6 

1909.2 
2124.1 
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2365.0 

2636.8 
2945.1 

3296.7 

3700.3 
4166.5 

4708.4 

5342.9 
6091.3 
6981.0 

8047.8 

]: 


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 


%  build  matrices  that  will  hold  cepstral  &  short  time  energy  data 


form_cols  =  n/2:  %  at  fs=1 6K  &  n=51 2  this  covers  0-8K  Hz 

form_rows  =  round(slice_size/(n-win_overlap))  + 1 ; 


form_matrix  =  zeros(form_rows,form_cols); 


out_matrix  =  zeros(form_rows,30): 
ste_array  =  zeros(form_rows,1 ); 


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 


hamming_vector=zeros(1  ,n);  %  compute  window  &  "lifter"  function  for  size=n 

hamming_vector=hamming(n)'; 

%compute  "lifter’s"  coefficients  for  cepstrum/formants 
lifter1=zeros(1,n); 
last=n; 
for  i  =  0:L 

Iifter1(i+1)  =  1+cos(pi*i/(2*L)): 
lifterl(last-i)  =  Iifter1(i+1): 
end 


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 
%  Main  "program"  loop  starts  here 

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 

win_start  =  1 ;  %  start  sliding  window  at  first  sample 

win_end  =  n; 

pitch_array  =  zeros(form_rows,1);  %  for  computing  median  of  pitch  values 
pitch_count  =  0; 
form_x  =  1 ; 

win_size  =  win_end  -  win_start  +  1 ; 

while  (win_end  <=  slice_end)  %  main  processing  loop 

ste  =  (speech_slice(1,win_start:win_end)  .*  speech_slice(1,win_start:win_end))/1 00000; 
ste_array(form_x,1)  =  mean(ste(1,:)): 

if  max(abs(speech_slice(1  ,win_start:win_end)))  >  0 
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%  compute  FFT 

abs_fft=abs(fft(hamming_vector.*speech_sIice{1  ,win_stai1:win_end))); 

%  now  compute  "cepstrum"; 

RCEP=ifft(log(abs_fft)): 
rcep=abs(RCEP(1  :win_si2e/2)): 


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 

%  determine  pitch 

[peak_val,peakJoc]  =  max(rcep(L:length(rcep))): 
pitch_prominence  =  peak_val/(10*median(rcep(L:length(rcep)))); 
cep_pitch=fs/(L+peak_loc) ; 
if  (pitch_prominence  >=  1 .5) 
pitch=cep_pitch; 
pitch_count  =  pitch_count  + 1 ; 
pitch_array(pitch_count)=pitch; 
elseif  (pitch_prominence  >=  0.6) 

%  if  two  pitch  estimates  are  within  10  %  of  each  other,  use  cep_pitch 
a_pitch=p_amdf(speech_slice(1  ,win_start:win_end),fs,n-(1  +((1/50)/(1  /fs)))); 
if  ((((abs(cep_pitch-a_pitch))/cep_pitch)  <  0.10))  I  ((cep_pitch  >=  100)  &  (cep_pitch  <=  200)  & 
(pitch_count  <  1)) 

pitch=cep_pitch: 
pitch_count  =  pitch_count  +  1 ; 
pitch_array(pitch_count)=pitch: 
end 
else 
pitch=0: 
end 


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 
%  smooth  formants  using  real  cepstrum 


RCEPL1  =zeros(1,length(RCEP)); 

%"lifter“  out  pitch  &  rhamonics 

RCEPL1(1:length(RCEP))  =  RCEP(1  :length{RCEP))  .*  lifterl  (1  :length(RCEP)); 

FORMANTS  =  fft(RCEPL1 );  %now  take  FFT  to  see  formants 
form_matrix(form_x,:)  =  abs(FORMANTS{1  :n/2)): 

%disp('Pitch_Prom:'): 

%disp(pitch_prominence): 

%disp('Pitch:'): 

%disp(pitch): 

%plot(abs(FORMANTS(1:n/2))),titleCFORMANTS'): 

%pause; 

%plot(rcep(L:length(rcep))),title('CEPSTRUM'); 

%pause; 


end  %  end  of  “if  non-zero  amplitude" 


%  slide  window  to  next  (overlapped)  position 


139 


form_x  =  form_x  + 1 ; 

win_start  =  win_end  + 1  -  win_overlap: 
win_end  =  win_start  +  n  - 1; 

end  %  while 


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 
%  main  “program"  loop  ends  here 

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 


%  now  "normalize"  data  for  speaker's  pitch 
if  pitch_count  >  0 

pitch  =  median(pitch_array(1  :pitch_count)): 
if  (pitch  <  50)  I  (pitch  >  400) 
pitch  =  100; 
out jjitch  =  0; 
else 

ouLpitch  =  pitch; 
end 
else 

pitch  =  100; 
out_pitch  =  0; 
end 

bin_starts  =  bin_starts  /  (1 .185  -  (0.00185  *  pitch)); 
bin_ends  =  bin_ends  /  (1.185  -  (0.00185  *  pitch)); 

fori  =  1:29 

bin_start  =  round(n*bin_starts(i)/fs); 
if  bin_start  <  1 
bin_start  =  1 ; 
end 

if  bin_start  >  n/2 
bin_start  =  n/2; 
end 

bin_end  =  round(n*bin_ends(i)/fs); 
if  bin_end  <  1 
bin_end  =  1 ; 
end 

if  bin_end  >  n/2 
bin_end  =  n/2; 
end 

for  j  =  1  :form_rows 

out_matrix(j,i)  =  mean(form_matrix(j,bin_start:bin_end)); 
end 
end 


out_matrix(:,30)  =  ste_array(:,1); 

%plot(ste_array(:,1 )); 
%drawnow; 
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F.  TESTNET.M 


%  TESTNET  -  Gathers  statistics  for  neural  network  weights  trained  by  TRNBPX. 
%  These  weights  must  already  be  loaded  when  this  script  file  is 
%  started. 

% 

%  You  must  provide  two  text  files: 

% 

%  (1)  A  text  file  listing  the  diphones  to  be  recognized  (one  diphone 
%  per  line,  a  single  space  separating  the  phonemes  of  the  diphone, 

%  and  all  caps) 

% 

%  (2)  A  text  file  listing  the  test  MAT  files  created  by  MAKVECTS.M 
% 

%  Written  by:  Major  Mark  Cantrell,  USMC  Feb  1996 
% 

%  Last  Updated  8  Feb  96  (new  time  alignment  method) 

%  20  Apr  96  (bug  fixes) 

%  7  Jun  96  (patched  to  use  RMS  Amplitude  vice  short-time  power) 

revision  =  '5.2'; 

if  ~exist('disp_files') 
dispjiles  =  0; 
end 

max_win_offset  =  input('Maximum  offset  from  center  for  recognition  window?  '); 
mid_pos  =  round((27-win_width)/2)  -t- 1; 
win_slack  =  27-win_width+1 ; 

win_scan_start  =  14  -  (round(win_width/2)  +  max_win_offset  - 1); 
win_scan_stop  =  win_scan_start  +  (2  *  max_win_offset); 
if  win_scan_start  <  1 
win_scan_start  =  1; 
end 

if  win_scan_stop  >  win_slack 
win_scan_stop  =  win_slack: 
end 


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%o/o%%%o/o%%%o/o%%o/oO/o%%%%%%% 

%  write  header  to  report  file 

report_filename  =  'nntest.txt': 
report_file  =  fopen(report_filename,'at+'): 

fprintf(reportJile,'%s\n',' - '); 

fprintf(reportJile,’%s%s%s\n\n','NNET  TEST  RECORD  Ver.  ’,revision',‘ 

(Epochs,SSEs,#Correct,#MoreThan3off)'): 

fprintf(reportJile,’%s\n',remarks): 

fprintf(report_file,'%s\n  win_width  =  %d,  max_offset  =  %d,  seed  =  %d\n',win_width,max_win_offset): 
fprintf(report_file,'%s/%s:  Hidden  =  %d,  Num_Dips  =  %d ,  me  =  %f ,  er  = 

%f\n\n',f1  ,f2,hidden_size,num_dips,mc,er): 
tv  =  fix(clock): 

time_stamp  =  strcat('Start  Time  =  ',num2str(tv(1))): 
time_stamp  =  strcat(time_stamp,'/‘); 
time_stamp  =  strcat(time_stamp,num2str(tv(2))); 
time_stamp  =  strcat(time_stamp,'/’); 
time_stamp  =  strcat(time_stamp,num2str(tv(3))); 
time_stamp  =  strcat(time_stamp,'  '); 
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time_stamp  =  strcat(time_stamp,num2str(tv(4))): 
time_stamp  =  strcat(time_stamp,':'); 
time_stamp  =  strcat(time_stamp,num2str(tv(5))); 
time_stamp  =  strcat(time_stamp,':’); 
time_stamp  =  strcat(time_stamp,num2str(tv(6))): 
fprintf(reporl_file,'%s\n',time_stamp); 
fclose(report_file); 

test_record  =  zeros(num_dips,5); 
test_offsets  =  2eros(1 ,1 ); 
pass_num  =  1 ; 


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 
%%%%%%%%%%%%%%%%%%%%%%%%%%%  TEST  CODE  START 
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 


disp(strcat(num2str(epoch_count),' - [TESTING] - ')); 

disp(fix(clock)); 

test_file  =  fopen(tfpname); 

line_read  =  "; 
dip_name  =  blanks(7); 
test_file_num  =  0; 
test_SSE  =  0; 
test_coiTect  =  0; 
test_wrong_by_alot  =  0; 

test_dip_record  =  2eros(num_dips,3);  %  for  each  dip  in  last  pass:  #  seen,  #  correct,  #  way  off 


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 
%  test  file  loop 

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 


report_file  =  fopen(report_filename,'at+'); 

fprintf(report_file,^nDIP  OUT  ER  RECOG  OUT  OFFSET  \n‘): 

while  (-feof(test_file))  &  (length(line_read)>0)  %  for  every  test  file 
line_read  =  (fgetl(test_file)); 
if  length(line_read)  >  2 

filename  =  line_read;  %  used  to  load  file 
test_file_num  =  test_file_num  +  1 ; 
if  disp_files  ==  1 

disp(strcat(strcat(num2str(test_file_num),'  =  test  file  '),line_read)): 
end 


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 
%  pick  out  diphone  namefrom  filename 

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 
dotJoc=findstr(filename,'.'): 
if  dotJooO 

dip_str=filename(1  :dot_loc-1 ); 
j  =  length(dip_str): 

while  (dip_str(j)  >=  '0')  &  (dip_str(j)  <=  '9') 
j  =  j-1: 

end 
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i  =  j: 

while  (dip_str(i)  -=  7)  &  (dip_str(i)  -=  V)  &  (i  >  1) 

end 
if  i  >  1 
i  =  i  +  1: 
end 

dip_name  =  blanks(7); 
dip_name{1:j-i+1)  =  upper(dip_str(i:j)): 
for  i  =  1  :length(dip_name)  %  get  rid  of  hyphen 

if  dip_name(i)  == 
dip_name(i)  = ' 
end 
end 
else 

error(strcat('Bad  TEST  filename: filename)); 
end 

else 

break;  %  bail  out  of  eof  loop  if  line  blank 

end 


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 
%  determine  target  output  vector 

%  note  that  dipnames  file  MUST  be  alpha  sorted  &  pound  ('#')  signs  replaced  by  'X' !!! 
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 


%  use  bin  search  to  find  dip's  column  number 
dip_num  =  dipnumof(dip_name,dip_names(1:num_dips,:)); 


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 
%  load  test  MAT  file 

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 
load(filename,  '-mat'); 
p  =  zeros(win_width*30,1); 

best_out  =  -inf; 
bestjaos  =  0; 
best_rank  =  num_dips; 


%  look  for  best  window  pos  in  vector  a 
for  win_pos  =  win_scan_start:win_scan_stop 

%  sequence  each  column  from  matrix  into  vector 
p  =  vectoriz(feature_surf_a,win_width,win_pos); 

ste=p(ste_start:input_size);  %  this  is  patch  to  convert  STPwr  to  RMS  Amplitude 
p(ste_start:input_size)=sign(ste).*sqrt(abs{ste)); 

a1  =  feval(f1,w1*p,b1); 
a2  =  feval(f2,w2*a1,b2); 


a2_copy  =  a2; 

[max_val  max_pos]  =  max(a2_copy); 


a2_copy(max_pos)  =  0; 
test_wrong_by  =  0; 
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sum_winners  =  0; 
while  max_pos  ~=  dip_num 
sum_winners  =  sum_winners  +  a2_copy(max_pos): 

[max_val  max_pos]  =  nnax(a2_copy): 
a2_copy(max_pos)  =  0; 
test_wrong_by  =  test_wrong_by  +1 ; 
end 

if  (test_wrong_by  <  best_rank) 
best_rank  =  test_wrong_by; 
best_out  =  a2(dip_num)  -  sum_winners: 
best_pos  =  win_pos: 
best_p  =  p; 

elseif  (test_wrong_by  ==  best_rank)&((a2{dip_num)  -  sum_winners)  >  best_out) 
best_out  =  a2(dip_num)  -  sum_winners; 
best_pos  =  win_pos; 
best_p  =  p; 
end 


%sleep  for  niceness  seconds  if  during  work  hours 

tv  =  fix(clock); 

if  (niceness  >  0)  &  (tv(4)  >  6)  &  (tv(4)  <  15) 
pause(niceness); 
end 


%  look  for  best  window  position  in  vector  b 
for  win_pos  =  win_scan_starl:win_scan_stop 

%  sequence  each  column  from  matrix  into  vector 
p  =  vectoriz(feature_surf_b.win_width,win_pos): 

ste=p(ste_start;input_size):  %  this  is  patch  to  convert  STPwr  to  RMS  Amplitude 
p(ste_start:input_size)=sign(ste).*sqrt(abs(ste)): 

a1  =  feval(f  1  ,w1  *p,b1 ); 
a2  =  feval(f2,w2*a1  ,b2); 


a2_copy  =  a2: 

[max_val  max_pos]  =  max(a2_copy); 

a2_copy(max_pos)  =  0; 
test_wrong_by  =  0; 
sum_winners  =  0; 
while  max_pos  -=  dip_num 
sum_winners  =  sum_winners  +  a2_copy(max_pos): 
[max_val  max_pos]  =  max(a2_copy): 
a2_copy(max_pos)  =  0; 
test_wrong_by  =  test_wrong_by  +1 ; 
end 

if  (test_wrong_by  <  best_rank) 
best_rank  =  test_wrong_by; 
best_out  =  a2(dip_num)  •  sum_winners; 
best_pos  =  win_pos  +  win_slack; 
best_p  =  p; 
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elseif  (test_wrong_by  ==  best_rank)&((a2(dip_num)  -  sum_winners)  >  best_out) 
best_out  =  a2(dip_num)  -  sum_winners: 
best_pos  =  win_pos  +  win_slack: 
best_p  =  p; 
end 

end 

%sleep  for  niceness  seconds  if  during  work  hours 

tv  =  fix(clock); 

if  (niceness  >  0)  &  (tv(4)  >  6)  &  (tv(4)  <  15) 
pause(niceness); 
end 

test_offsets(test_file_num,1 )  =  best_pos;  %  save  window  ofset  for  next  epoch 

p  =  best_p: 


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 
%  PRESENTATION  PHASE  -  Determine  outputs  of  layers 
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 


a1  =feval(f1,w1*p,b1); 
a2  =  feval(f2,w2*a1  ,b2); 
e  =  t  -  a2: 

test_SSE  =  test_SSE  +  sumsqr(e); 

%sleep  for  niceness  seconds  if  during  work  hours 

tv  =  fix(clock); 

if  (niceness  >  0)  &  (tv(4)  >  6)  &  (tv(4)  <  15) 
pause(niceness); 
end 


if  disp_files  ==  1  %  display  progress 

str  =strcat(strcat(num2str(epoch_count),'  =  epoch.  SSE  =  '),num2str(sumsqr(e))); 
else 
str  =  "; 
end 


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 
%  record  whether  NNET  output  was  correct  or  way  off 

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%o/o%%% 


[max_val  max_pos]  =  max(a2); 
test_record(test_file_num,:)  =  zeros(1,5): 
test_record(test_fiIe_num,1)  =  dip_num: 
test_record(test_file_num,2)  =  a2(dip_num): 
test_record(test_file_num,4)  =  max_pos; 
test_record(test_file_num,5)  =  max_val; 

test_dip_record(dip_num,1)  =  test_dip_record(dip_num,1)  +  1; 
if  max_pos  ~=  dip_num 
a2_copy  =  a2; 
a2_copy(max_pos)  =  0; 
test_wrongLby  =  0; 
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while  max_pos  ~=  dip_num 
[max_val  max_pos]  =  max(a2_copy): 
a2_copy(max_pos)  =  0; 
test_wrong_by  =  test_wrong_by  +1 ; 
end 

test_record(test_file_num,3)  =  test_wrong_by; 

str  =  strcat(strcat(strcat(str,'  (wrong  by  '),num2str(test_wrong_by)),')'): 

if  test_wrong_by  >  2 

test_wrong_by_alot  =  test_wrong_by_alot+1 ; 
test_dip_record(dip_num,3)  =  test_dip_record(dip_num,3)  +  1 ; 
end 
else 

test_record(test_file_num,3)  =  0; 
test_correct  =  test_correct+1 ; 

test_dip_record(dip_num,2)  =  test_dip_record(dip_num,2)  +  1 ; 
end 

if  dispjiles  ==  1 

str  =  strcat(str,strcat('  @',num2str(best_pos))): 
disp(str); 

disp(test_record(test_file_num,:)); 

end 

V  =  test_record(test_file_num,:); 
o  =  test_offsets(test_file_num,1); 

fprintf(report_file,'%s,  %f,  %d,  %s,  %f,%d\n', 

dip_names(v(1),:),v(2),v(3),dip_names(v(4),:),v(5),o); 


end 


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 
%  end  of  while  not  eof  testjile 

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 


fclose(test_file); 

end  %  end  of  "if  sep_test_data..." 


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 


%  record  epoch  stats  in  report  file 

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 


report_file  =  fopen(report_filename,'at+‘): 

fprintf(report_file,'\n%d,  TRAIN, %f,%d,%d,',epoch_count,SSE, correct, wrong_by_alot); 
if  sep_test_data 

fprintf(report_file,'  TEST,%f,%d,%d,  %s\n',test_SSE,test_correct,test_wrong_by_alot,str): 
else 

fprintf(report_file,'  %s\n',str): 
end 

fclose(reporl_file): 


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 
%  write  trailer  to  report  file 

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%o/o%%% 

reportjile  =  fopen(report_filename,'at+'); 
tv  =  fix(clock): 


146 


time_stamp  =  strcat('Stop  Time  =  ',num2str(tv(1))): 
time_stamp  =  strcat(time_stamp,'/’); 
time_stamp  =  strcat(time_stamp,num2str(tv(2))): 
time_stamp  =  strcat(time_stamp,'/'); 
time_stamp  =  strcat(time_stamp,num2str(tv(3))): 
time_stamp  =  strcat(time_stamp,‘  '); 
time_stamp  =  strcat(time_stamp,num2str(tv(4))): 
time_stamp  =  strcat(time_stamp,':'); 
time_stamp  =  strcat(time_stamp,num2str(tv(5))): 
time_stamp  =  strcat(time_stamp,':'); 
time_stamp  =  strcat(time_stamp,num2str(tv(6))): 
fprintf(report_fileAn%s\n',time_stamp); 
if  gigaflops  >  0 

fprintf(report_file,'%fgiga-flops\n', gigaflops); 
else 

fprintf(report_file,'%d  flopsVn', flops); 
end 

fprintf(report_file,'%s  %f\n',str,SSE): 
fprintf(report_fileAn%s\n\n',’END  OF  NNETTEST  RECORD'); 
fclose(report_file); 

save  testrec 


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%<%0^0/^ 
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G.  MAKVECTS.M 


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 
%  MAKVECTS.M  -  MatLab  script  file  for  creating  precomputed  feature 
%  vector  files  for  use  by  TRNBPX. 

% 

%  Written  by  Major  M.E.  Cantrell,  USMC 
% 

%  Last  Modified: 

% 

%  7  June  1996 

% 

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 


clear; 

cic; 

ch  =  input('Use  frequency  normalization?  ','s'): 
if  upper(ch(1))  ==  'N‘ 
freq_norm  =  0; 
out_ext  =  '.BAD'; 
else 

freq_norm  =  1 ; 
out_ext  =  '.vec'; 
end 


ch  =  input{Two  versions?  ','s'); 
if  upper(ch(1))  ==  'N' 
two_ver  =  0; 
else 

two_ver=  1; 
end 

niceness_setting  =  input('Niceness  setting  (will  sleep  that  many  seconds  per  file)? '); 

save  nice.txt  niceness_setting  -ascii 
niceness  =  niceness_setting; 


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 


speech_data  =  [];  %  vector  with  entire  speech  file's  contents 

sampsize  =  D;  %  size  of  above 

fname  =  Q;  %  will  be  string  with  filename 
pname  =  □;  %  will  be  string  with  pathname 

fpname  =  Q;  %  will  be  string  with  filename  +  pathname 
varname  =  Q;  %  will  be  string  with  filename  less  extension 

last_pname  =  0; 

vect_count  =  0; 
no_pitch_count  =  0; 


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%<>/o%%%%%% 
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%  get  file  list  filename  from  user 

[Ifname.Ipname]  =  uigetfile('*.*', 'Filename  Containing  List  of  Voice  Files’); 
if  lfname==0 
return 
end 

lfpname=strcat(lpname,lfname); 
ldotJoc=findstr(lfname,‘.'): 
if  length(ldotjoc)>0 
lvamame=lfname(1  :ldotJoc-1 ); 
lfname_ext=upper(lfname(ldotJoc+1:length(lfname))): 
else 

lvarname=lfname; 

end 


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 


listjile  =  fopen(lfpname): 
badmatjile  =  fopen('badmats.dir','wt'); 
report_file  =  fopen('makevrpt.txt','wt'); 

fprintf(report_file, 'FOLLOWING  DIPHONES  WERE  UNVOICED:\n\n’): 
tic;  %  start  timer 
line_read  =  fgetl(list_file); 

while  (~feof(list_file))  &  (length(line_read)>0)  %  for  every  filename  listed  in  list  file 
%sleep  for  niceness  seconds  if  during  work  hours 

tv  =  fix(clock); 

if  (niceness  >  0)  &  (tv(4)  >  6)  &  (tv{4)  <  1 8) 
pause(niceness); 
end 

dispC - '); 

disp(fix(clock)); 

disp(line_read); 


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 


dotJoc=findstr(line_read,’.');  %  pick  out  pathname  from  line  read 

fpname_end=dotJoc+3; 

fpname  =  line_read(1  :fpname_end); 

i  =  dotJoc-1; 

while  (i  >  0)  & 

(((fpname(i)>=’A')&(fpname(i)<='Z'))l((fpname(i)>='a')&(fpname(i)<='z’))l((fpname(i)>=’0')&(fpname(i)<=’9 

■))) 

i=i-1; 

end 

pname=fpname(1:i); 
vamame=fpname{i+1  :dotJoc-1); 

disp  (strcat('Pathname  is  ’.pname)); 
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%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 


%  pick  out  diphone  name 
dipname_start  =  fpname_end  +  2; 
i  =  dipname_start: 
while  line_read(i)  ~= ' ' 
i=i+1; 
end 

dipname_end  =  i  - 1 ; 

dipname  =  upper(line_read(dipname_start:dipname_end)); 
disp(strcat('Diphone  is  '.dipname)); 


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 


%  get  location  of  this  diphone's  phoneme  junction 

dipjimits  =  sscanf(line_read(dipname_end+1  :length(line_read)),‘%d  %d  %d'); 
dip  Junction  =  dipjimits(2): 

disp(strcat('Diphone  junction  is:  ■,num2str(dip Junction))); 


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 


disp(strcat(’Loading  '.fpname)); 

speech_data  =  Id16bit(fpname); 

speech_data=speech_data’; 

speech_data=speech_data(513:length(speech_data)); 


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 


%  create  feature  "vectors" 
dispCComputing  feature  vectors...'); 

if  freq_norm  ==  1 

[feature_surf,median_pitch,formants_a]  =  features(speech_data,dipJunction); 
feature_surf_a  =  diff_mat(feature_surf); 
if  two_ver  ==  1 

[feature_surf,median_pitch_2,formants_b]  =  features(speech_data,dipJunction-1 70); 
feature_surf_b  =  diff_mat(feature_surf); 
end 
else 

[feature_surf,median_pitch,formants_a]  =  featureb(speech_data,dip Junction); 
feature_surf_a  =  diff_mat(feature_surf); 
if  two_ver  ==  1 

[feature_surf,median_pitch_2,formants_b]  =  featureb(speech_data,dip Junction-170); 
feature_surf_b  =  diff_mat(feature_surf); 
end 
end 


if  median_pitch  ==  0 
median_pitch  =  median_pitch_2; 
end 
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%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 


%  save  feature  "vectors"  to  a  file  with  same  path 

filenum  =  0; 
file_exists  =  1; 
while  file_exists 

outjpname  =  strcat{pname,dipname): 
outjpname  =  strcat(out_fpname,num2str(filenum)); 
outjpname  =  lower(strcat(outJpname,out_ext)): 
file_exists  =  exist(outjpname); 
filenum  =  filenum  + 1 ; 
end 

disp(strcat('Median  pitch  is  ',num2str(median_pitch))); 

if  median_pitch  <=  0 
no_pitch_count  =  no_pitch_count  + 1 ; 
fprintf(reportJile,'%s\n',line_read): 
fprintf(badmatjile,‘%s\n', outjpname); 
end 

vect_count  =  vect_count  + 1 ; 

disp{strcat('Vector  count  now  ',num2str(vect_count))): 

disp(vect_count); 

disp(strcat('Saving  features  to  '.outjpname)); 
if  two_ver  ==  1 

eval(['save  ’  outjpname '  feature_surf_a  feature_surf_b  median_pitch']); 
else 

eval(['save  ’  outjpname '  feature_surf_a  median_pitch']); 
end 

clear  speech_data;  %  clean  up  workspace  before  next  phoneme 

line_read  =  fgetl(listjile);  %  get  next  filename  from  list  file 

end  %  while  not  eof(listjile) 

fclose(llstjile); 

fclose(reportjile); 

fclose(badmatjlle); 

disp(Total  elapsed  time;'); 

toe;  %  display  total  elapsed  time  in  seconds 

pack; 
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H.  INPUTS.M 


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 
%  INPUTS  -  Get  training  parameters  from  user  for  use  by  tmbpx.m 
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 


revision  =  i.0‘; 


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 

%  GET  USER  DEFINABLE  PARAMETERS 

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 


remarks2  = ' 

me  =  input('Maximum  total  epochs?  '); 

me  =  input('Momentum  Constant  (0.90  to  .95  is  standard)? '); 

er  =  input('Allowable  Error  Ratio  (1.04  to  1.05  is  standard)? '); 

eg  =  input('Error  goal  per  output  (0.02  is  standard)? '); 

ch  =  input('Reset  weights  to  random  values?  ’,‘s'): 
if  upper(ch(1))  ==  'Y' 

hidden_size  =  input('How  many  neurons  in  hidden  layer? '); 
win_width  =  input('Width  of  recognition  window  (13  standard)?  '); 
reset_weights  =  1 ; 
remarks2  =  'New  Weights, 
else 

reset_weights  =  0; 
remarks2  =  ‘Same  Weights, 
end 

ch  =  input(Train  on  first  instance  of  each  diphone  only?  ’,'s'); 
if  upper(ch(1))  =  'Y' 
singleJnstance_only  =  1 ; 

remarks2  =  strcat(remarks2,'First  Instance  Only, '); 
else 

singleJnstance_only  =  0; 
remarks2  =  strcat(remarks2,'Entire  File  Set, '); 
end 

ch  =  input('Batch  weight  changes?  ’,’s'); 
if  upper(ch(1))  ==  'Y' 
batch  =  1 ; 
else 

batch  =  0; 

remarks2  =  strcat(remarks2,'NO  BATCHING, '); 
end 

if  (singleJnstance_only  ==  0) 
ch  =  input(Time-Align  Feature  Vectors?  ','s'); 
if  upper(ch(1))  =  'Y' 
time_align  =  1; 

remarks2  =  strcat(remarks2,'Time-Aligning, '); 
ch  =  input('Use  static  file  offsets?  ','s'): 
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if  upper(ch(1))  ==  'Y' 
static_offsets  =  1 ; 
if  exist('tmoffs.mat') 
offname  =  trnoffs'; 
else 
Is 

offname  =  input('What  is  name  of  matlab  file  with  offsets  (only)?  ','s'): 
end 
else 

static_offsets  =  0; 
offname  =  'trnoffs'; 
end 

max_win_offset  =  input('Maximum  offset  from  center  for  recognition  window? '); 
else 

time_align  =  0; 

remarks2  =  strcat(remarks2,'Not  Time-Aligning, '); 
static_offsets  =  0; 
offname  =  'trnoffs'; 
end 

ch  =  input('Separate  test  data?  ','s'); 
if  upper(ch(1))  ==  'Y' 
sep_test_data  =  1 ; 

remarks2  =  strcat(remarks2,'Separate  Test  Set,'); 
if  time_align  ==  0 

max_win_offset  =  input('Maximum  offset  from  center  for  test  files? '); 
end 
else 

sep_test_data  =  0; 

remarks2  =  strcat(remarks2,'No  Separate  Test  Set,'); 
end 
else 

sep_test_data  =  0; 
time_align  =  0; 
static_offsets  =  0; 
max_win_offset  =  1; 
end 

remarks  =  input('Remarks  for  record?  ','s‘); 


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%o/o%%%o/o%%%%%%o/o%%%%o/o 

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%<>/o%%%%%o/o%%%%%%%%o/o% 

niceness_setting  =  input('Niceness  setting  (will  sleep  9  times  that  many  seconds  per  file)?  '); 

save  nice.txt  niceness_setting  -ascii 
niceness  =  niceness_setting; 


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%'>/o%%%%'>/o%%<>/o%%%%%%<>/o%% 

%  determine  operating  system  in  use 

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%'>/o%o/o%%%%%%%%%%%%%%% 

if  findstr(Y,pwd)  ==  Q 
op_sys  =  'unix'; 
else 

op_sys  =  'dos '; 
end 
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%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 
%  get  diphone  list  filename  from  user 

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 
Ipname  =  pwd; 
if  exist('diplist.txt') 

Ifname  =  'diplist.txt'; 
else 
Is 

Ifname  =  input('What  is  name  of  diphone  list  file?  ','s'); 
end 

if  op_sys  ==  'Unix' 

Ipname  =  strcatfipname,'/'); 
else 

Ipname  =  strcat(lpname,'V): 
end 


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 
%  get  training  file  list  filename  from  user 

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 
mpname  =  pwd; 

Is 

mfname  =  input('What  is  name  of  training  file  list?  ‘,'s'); 
if  op_sys  ==  'Unix* 
mpname  =  strcatfmpname,'/'); 
else 

mpname  =  strcat(mpname,T); 
end 


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 
%  get  test  file  list  filename  from  user 

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 
if  (sep_test_data  ==  1) 
tpname  =  pwd; 

Is 

tfname  =  input('What  is  name  of  test  file  list?  ‘.'s'); 
if  op_sys  ==  'unix' 
tpname  =  strcat(tpname,'/'); 
else 

tpname  =  strcat(tpname,T); 
end 
end 


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 
%  save  inputs  to  a  mat  file 

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 


save  inputs 
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I.  P_AMDF.M 


function  [pitch]  =  p_amdf(speech_data,fs,n) 

%P_AMDF  Voice  Pitch  using  Average  Magnitude  Difference  Function 

% 

%  This  function  computes  the  pitch  of  a  voice  signai  (first  argument). 

%  The  sample  frequency  is  required  as  the  second  argument  and  the  third 

%  must  be  the  AMDF  window  size.  Because  of  the  nature  of  the  algorithm,  the 

%  speech  vector  must  be  longer  than  the  window  size  by  1+((1/50)/(1/fs))  or 
%  321  at  16  KHz.  This  version  is  specifically  intended  for  voice  signals. 

%  So  it  wiii  oniy  detect  pitches  in  the  range  50  to  500  Hz.  Note  that 
%  accuracy  suffers  near  the  end  of  a  voiced  segment  of  speech.  Also,  this 

%  func  can  be  slow.  Try  a  smaller  N  to  speed  things  up.  You'll  probably 

%  get  the  same  answer  as  you  would  with  a  larger  N  (this  N  is  not  related 

%  to  the  N  used  with  FFT ;  it  only  determines  how  much  of  speech  segment 

%  to  use  for  pitch  determination). 

% 

%  This  code  is  based  on  [Ainsworth  p96].  See  thesis  for  complete  reference 
%  information. 

% 

% 

%USAGE  EXAMPLE:  pitch  =  p_amdf(data(1  :n+321  ),fs,n): 

% 


if  -(nargin  ==  3) 

error(’invalid  number  of  arguments  to  P_AMDF.M'): 
end 

maxjag  =  1+(1/50)/(1/fs):  %  vaiid  pitch  range  50-500  Hz 
minjag  =  1+(1/500)/(1/fs); 

last=length(speech_data); 

if  n+maxjag  <=  last  %  can't  do  it  if  too  near  end  file 

lag_array  =  zeros(1  ,max_lag):  %  only  need  this  for  plot  of  lags 
bestjag_sum=inf;  %  initialize  big  so  anything  looks  better 
for  lag  =  min_lag:1:maxjag  %  chg  of  1  in  step  size  =  about  15  Hz 

%  (at  pitch=500).  step  5  =  68  Hz  (too  big). 


Iag_sum=sum(abs(speech_data(1  :n)-speech_data(lag+1  :n+iag))); 

if  lag_sum  <=  best_lag_sum  %  remember  best;  if  tie  use  iower  Hz 
bestjag_sum=lag_sum; 
best_lag=lag; 
end 

lag_array(lag)  =  lag_sum:  %  only  needed  for  lag  plot 
end 

pitch=1  /((bestJag-1  )*(1  /fs));  %  compute  pitch  from  best  lag 

else 

error('Sample  too  small.  Need  at  least  N  + 1  +  ((1/50)/(1/fs))  samples.'); 
end 
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J.  P_CEPS.M 


function  [pitch  ,  confidence]  =  p_ceps(ceps_data,fs,n,L) 

%P_CEPS  Voice  Pitch  using  Cepstral  Peak 
% 

%  This  function  computes  the  pitch  of  a  voice  signal  using  cepstral 
%  coefficients  (first  argument).  The  sample  frequency  is  required  as 
%  the  second  argument,  the  third  must  be  the  window  size,  and  the  fourth 
%  is  the  cutoff  between  the  pitch  peak  and  the  formant  peak.  If  requested, 

%  a  number  indicating  the  prominence  of  the  pitch  peak  is  also  returned 

%  (one  tenth  the  ratio  of  the  peak's  height  to  mean  amplitude  of  the 
%  surrounding  plot).  If  this  value  is  less  than  one,  a  pitch  value  of  zero 

%  is  returned  (unless  the  confidence  number  is  also  requested).  The  second 

%  returned  value  may  be  used  as  an  indicator  of  the  degree  of  voicing,  as 
%  well  as  an  indication  of  the  reliability  of  the  pitch  figure.  Note  that 

%  the  window  size  N  and  Fs  must  be  the  same  as  those  used  to  produce  the 

%  cepstrum.  Because  this  function  is  intended  for  speech  processing,  a 
%  zero  will  be  returned  unless  the  computed  pitch  is  between  50  and  500  Hz. 

% 

% 

%USAGE  EXAMPLE:  cep_vector=cepstrum(data(1  :n),fs,hamming(n)); 

%  [pitch  confidence]=p_ceps(cep_vector,fs,n,round(n/10)): 

% 

if  ~(nargin  =  4) 

error('lnvalid  number  of  arguments  in  P_CEPS'); 
end 

if  length(fs)=0 

error(‘lnvalid  sample  frequency  in  P_CEPS'): 
end 

if  length(ceps_data)  ==  n/2 
rceps=abs(ceps_data): 
elseif  length(ceps_data)  =  n 
rceps=(abs(ceps_data(1  :n/2))); 
else 

error(‘Window  size  and  cepstrum  vector  size  must  agree'); 
end 

confid  =  max(rceps(L:length(rceps)))/(1 0*mean(rceps(L:length(rceps)))); 

%  determine  pitch  (if  strongly  voiced  or  user  asked  for  confid) 
if  (confid  >=  1)l(nargout==2)  %  Try  using  cepstral  peak  to  determine  pitch 
[peak_val,peak_loc]  =  max(rcep(L:length(rcep))); 
cep_pitch=fs/(L+peak_loc); 

if  (cep_pitch<500)&(cep_pitch>50)  %  only  return  pitch  if  in  reasonable  range 
pitch=cep_pitch; 
else 
pitch=0; 
end 
else 
pitch=0; 
end 

if  nargout==2 
confidence=confid: 
end 
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K.  VECTORIZ.M 


function  [p]  =  vectoriz(feature_surf,win_width,win_pos) 

%VECTORIZ  sequence  each  column  from  feature  matrix  into  a  vector 
% 

%  USAGE:  input_vect  =  vectoriz(in_mat,win_width,win_pos) 

% 

%for  i  =  1:30,  sequence  each  column  from  matrix  into  vector 
%  this  function  does  not  use  for  loop  because  MATLAB  for  loops  are  SLOW 

p=zeros(win_width*30,1 ); 

p(1  *win_width-win_width+1 :1  *win_width,1 )  =  feature_surf(win_pos:win_pos+win_width-1 ,1 ); 
p(2*win_width-win_width+1 :2*win_width,1)  =  feature_surf(win_pos:win_pos+win_width-1 ,2); 
p(3*win_width-win_width+1 :3*win_width,1)  =  feature_surf(win_pos:win_pos+win_width-1 ,3); 
p(4*win_width-win_width+1 :4*win_width,  1 )  =  feature_surf(win_pos:win_pos+win_width-1 ,4); 
p(5*win_width-win_width+1 :5*win_width,1 )  =  feature_surf(win_pos:win_pos+\win_width-1 ,5); 
p{6*win_width-win_width+1 :6*win_width,1)  =  feature_surf(win_pos:win_pos+win_width-1 ,6); 
p(7*win_width-win_width+1 :7*win_width,1)  =  feature_surf(win_pos:win_pos+win_width-1 ,7); 
p(8*win_width-win_\«idth+1 :8*win_width,1)  =  feature_surf(win_pos:win_pos+win_width-1 ,8); 
p(9*win_width-win_width+1 :9*win_width,1)  =  feature_surf(win_pos:win_pos+win_width-1 ,9); 
p(1 0*win_width-win_width+1 :1 0*win_width,1 )  =  feature_surf(win_pos:win_pos+win_width-1 ,1 0); 

p(11*win_width-win_width+1 :1 1*win_width,1)  =  feature_surf(win_pos:win_pos+win_width-1 ,11); 
p(12*win_width-win_width+1 :12*win_width,1)  =  feature_surf(win_pos:win_pos+win_width-1 ,12); 
p(13*win_width-win_width+1 :13*win_width,1)  =  feature_surf(win_pos:win_pos+win_width-1 ,13); 
p(14*win_width-win_width+1:14*win_width,1)  =  feature_surf(win_pos:win_pos+win_width-1,14); 
p(1 5*win_width-win_width+1 :1 5*win_width,1 )  =  feature_surf(win_pos:win_pos+win_width-1 ,15); 
p(16*win_width-win_width+1 :16*win_width,1)  =  feature_surf(win_pos:win_pos+win_width-1,16); 
p{17*win_width-win_width+1 :17*win_width,1)  =  feature_surf(win_pos:win_pos+win_width-1 ,17); 
p{18*win_width-win_width+1 :18*win_width,1)  =  feature_surf(win_pos:win_pos+win_width-1,18); 
p(19*win_width-win_width+1 :19*win_width,1)  =  feature_surf(win_pos:win_pos+win_width-1,19); 
p(20*win_width-win_width+1 :20*win_width,1)  =  feature_surf(win_pos:win_pos+win_width-1,20); 

p(21*win_width-win_width+1 :21*win_width,1)  =  feature_surf(win_pos:win_pos+win_width-1,21); 
p(22*win_width-win_width+1 :22*win_width,1)  =  feature_surf(winj30s:win_pos+win_width-1 ,22); 
p(23*win_width-win_width+1 :23*win_width,1)  =  feature_surf(win_pos:win_pos+win_width-1 ,23); 
p(24*win_width-win_width+1 :24*win_width,1)  =  feature_surf(win_pos:win_pos+win_width-1 ,24); 
p(25*win_width-win_width+1 :25*win_width,1)  =  feature_surf(win_pos:win_pos+win_width-1 ,25); 
p(26*win_width-win_width+1 :26*win_width,1)  =  feature_surf(win_pos:win_pos+win_width-1 ,26); 
p(27*win_width-win_width+1 :27*win_width,1)  =  feature_surf(win_pos:win_pos+win_width-1 ,27); 
p(28*win_width-win_width+1 :28*win_width,1 )  =  feature_surf(win_pos:win_pos+win_width-1 ,28); 
p(29*win_width-win_width+1 :29*win_width,1)  =  feature_surf(win_pos:win_pos+win_width-1 ,29); 
p(30*win_width-win_width+1 :30*win_width,1 )  =  feature_surf(win_pos:win_pos+win_width-1 ,30); 
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L.  DIPNUMOF.M 


function[p]  =  dipnumof(dip_name,dip_names) 

%DIPNUMOF  Return  the  index  position  of  a  diphone  name  in  an  array 
%  of  diphone  names.  Uses  binary  search.  Requires  LESSTHAN.M. 
% 

%  Usage:  dip_num  =  dipnumof(dipname,dipnames); 

num_dips=length(dip_names); 
i  =  1: 

k  =  num_dips; 
found  =  0; 

while  (i  <=  k)&(found  ~=  1) 
j  =  round((i+kV2): 

if  strcmp(dip_names(j,:),dip_name)==1 
found=1 ; 

elseif  lessthan(dip_name,dip_names(j,:))=1 
k  =  j-1: 
else 
i  =  j+1: 
end 
end 

P  =  j: 


M.  DIFF_MAT.M 


function  [out_matrix]  =  diff_mat(in_mat) 

%DIF_MAT  Return  a  difference  matrix  for  specified  input  matrix. 

% 

%  Row  i  of  the  output  matrix  is  the  difference  between  rows  i-1  and  i 

%  of  the  input  matrix.  The  resulting  surface  represents  the 

%  rate  of  change  of  the  function  (assuming  row  number  increases  with 
%  time).  Note  that  the  resulting  matrix  will  have  one  less  row  than 
%  input  matrix. 

% 

% 

% 

%USAGE  EXAMPLE:  delta_surface  =  diff_mat(data(1  :maxrow,1  :maxcol)); 

% 


num_rowsJn  =  size(in_mat,1); 
num_cols_in  =  size(in_mat,2): 

result_mat  =  zeros(num_rows_in-1  ,num_cols_in); 

result_mat  =  (in_mat(2:num_rowsJn,:)-in_mat(1  :num_rowsJn-1 ,:)); 

out_matrix  =  result_mat; 
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N.  LD16BIT.M 


function  [y]  =  Id16bit(name, number, offset) 

%LD16BIT  Load  signed  16-bit  binary  data  file. 

%  [Y]  =  LD1 6BIT(’NAME')  load  the  entire  file  ’NAME'  as  1 6-bit 

%  signed  data  samples. 

% 

%  [Y]  =  LD16BITCNAME, NUMBER)  loads  the  first  NUMBER 

%  samples  starting  from  the  beginning  of  the  file. 

% 

%  [Y]  =  LD16BITCNAME', NUMBER, OFFSET)  loads  NUMBER  samples 

%  beginning  at  OFFSET  samples  from  the  beginning  of  the 
%  file.  If  NUMBER=0,  loads  starting  from  OFFSET  to  the  EOF. 

% 

%  See  also  LD1 6BIT,  LD32BIT,  SV8BIT,  SV1 6BIT,  SV32BIT 

%  LT  Dennis  W.  Brown  7-17-93,  DWB  1  -23-94 
%  Naval  Postgraduate  School,  Monterey,  CA 
%  May  be  freely  distributed. 

%  Not  for  use  in  commercial  products. 

%  default  output 

y  =  D: 

%  name  must  be  a  string 
if  ~isstr(name), 

error('ld16bit:  Filename  must  be  a  string  argument!'); 

end; 

%  open  the  file 
fid  =  fopen(name,'rb'); 
if  fid  =  (-1) 

error(['ld16bit:  Could  not  open  file  '  name]); 
end 

%  get  file  length 
fseek(fid,0,'eof'); 
flength  =  ftell(fid); 
fseek(fid,0,’bof'); 
if  nargin  ==  1 

%  read  the  entire  file 
y  =  fread(fid,flength/2,'int16'); 
elseif  nargin  ==  2 

%  read  some  of  the  samples  beginning  at  the  start  of  file 
%  error  check 
if  number  >  flength/2, 
fciose(fid); 

error('ld16bit:  Requested  number  of  samples  greater  than  filelength!'); 

end; 

%  read  samples 
fseek(fid,0,'bof'); 
y  =  fread(fid,number,'int16'); 
elseif  nargin  ==  3, 

%  read  some  of  the  samples  beginning  somewhere  in  the  file 
%  error  check  (offset  sample  is  included  in  number) 
if  number+offset-1  >  flength/2, 
fciose(fid); 

error('ld16bit:  Requested  number+offset  of  samples  greater  than  filelength!'); 

end; 
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if  number  ==  0, 

%  read  to  end-of-file 
fseek(fid,2*(offset-1  ),'bof): 
y  =  fread(fid,flength/2-offset+1 , 'inti  6'); 

else 

%  read  specified  number  of  samples 
fseek(fid,2*(offset-1  ),’bof'): 
y  =  fread(fid, number, 'inti  6'): 

end; 

end; 

fclose(fid); 

O.  LESSTHAN.M 


function[r]  =  Iessthan(str1  ,str2) 

%LESSTHAN  Return  1  if  diphone  name  in  strl  is  alphabetically 
%  less  than  diphone  name  in  str2. 

%  STRINGS  MUST  BE  7  CHARACTERS  LONG! 

% 

%  Usage:  if  lessthan(dip_names(i,1:7),dip_names(j,1:7))  ==  1;  break;  end 
% 

colvals  =  [10000  1000 100  10  1  0.1  0.01];  %  used  to  convert  MATLAB's  "<" 

%  result  (a  vector)  to  a  alpha 
%  comparison.  Crude,  but  it  works 
%  (for  7  char  strings  anyway). 

r  =  sum(colvals.*(str1  <  str2))  >  sum(colvals.*(str1  >  str2)); 


P.  STRCAT.M 


function  [out_str]  =  strcat(left,right) 

%STRCAT  Append  two  strings. 

% 

%  USAGE:  answer_str  =  strcat('Answer  is  ',int2str(result)); 

% 


left_len=length(left); 

rightjen=length(right); 

newjast=leftjen+rightjen; 

out_str  =  blanks(new_last); 

out_str(1  :left_len)=left(1  :leftjen); 

out_str(left_len+1  :new_last)=right(1  :rightjen); 
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APPENDIX  B.  ADA  CODE 


A.  CONTENTS 

1.  DIPCOUNT.ADA 

This  utility  counts  the  number  of  occurrences  of  each  diphone  in  a  phonetic  dictionary. 

2.  DIPDURS.ADA 

This  utility  gathers  information  on  the  maximum  durations  of  diphones  in  a  list  of  phonetic 
transcription  files. 

3.  DIPNUMS.ADA 

This  utility  counts  the  number  of  occurrences  of  diphones  in  the  phonetic  transcription 
files.  This  tells  how  many  exemplars  are  available  for  training  and  testing. 

4.  MAKFLIST.ADA 

This  utility  produces  a  list  containing  the  requested  number  of  speech  (.WAV)  files 
containing  the  designated  diphones.  The  output  is  used  by  MAKVECTS.M. 

5.  VOCAB.ADA 

This  utility  determines  the  vocabulary  that  can  be  built  from  a  given  list  of  diphones  (for  a 
particular  phonetic  dictionary). 

6.  MAKWLIST.ADA 

This  program  produces  a  list  of  speech  (.WAV)  files,  and  the  words  they  contain,  for  use 
by  RECOGNIZ.M  (in  recognizing  diphones  prior  to  word  tests). 

7.  PHNDURS.ADA 

This  program  gathers  statistics  on  phoneme  durations  from  phonetic  transcription  files. 
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B.  USING  THE  UTILITIES 

The  utilities  in  the  appendix  were  used  as  follows: 

1.  A  directory  listing  of  all  TIMU  SX  phonetic  transcription  (.PHN)  files  was  redirected  to  a 

text  file  (eg.  by  typing  "dir  /b  /s  *.phn  >  phnfiles.txt"  at  the  DOS  command  prompt  or 
"find"  at  the  UNDC  prompt).  The  list  of  transcription  files  was  used  by  DIPNUMS.ADA 
to  create  a  list  of  all  diphones  found  in  the  PHN  files.  The  output  file  was  in  the 
following  format  (the  numbers  refer  to  start,  phoneme  junction,  and  ending  sample 
number):  e:\timit\train\drl\falk0\sxl32.wav  b-r  0  2200  4300 

2.  The  list  produced  by  the  preceding  step  was  sorted  by  an  arbitrary  column  (one  of  the 
sample  number  fields)  in  order  to  scramble  the  speech  files.  Otherwise,  speakers  from 
the  first  few  dialect  regions  would  have  dominated  the  training  and  test  files  selected  in 
the  next  step  (because  they  are  early  in  the  directory  listing). 

3.  The  shuffled  file  list  from  the  preceding  step  was  used  by  MAKFLIST.ADA  to  produce  a 
final  list  of  the  speech  files  to  be  used.  This  utility  also  required  a  file  (produced  by 
DIPNUMS.ADA)  listing  the  diphones  to  be  included  in  the  data  set . 

4.  The  file  produced  by  MAKFLIST.ADA  was  used  by  a  MATLAB  script  file 
(MAKVECTS.M)  to  locate  and  compute  feature  vectors  from  the  actual  TIMIT  speech 
files. 

5.  Both  DIPNUMS.ADA  and  MAKFLIST.ADA  require  an  integer  prescribing  how  many 
occurrences  of  each  diphone  to  include. 

6.  Lists  of  the  resulting  feature  vector  files  (again,  produced  by  redirecting  directory  listings) 
were  used  by  TRNBPX.M  and  TESTNET  to  locate  the  feature  vector  files. 

7.  After  the  network  was  trained,  VOCAB.ADA  was  used  to  produce  a  list  of  words  in  the 
vocabulary  achievable  DIPLIST.TXT. 

8.  The  vocabulary  list  produced  by  VOCAB.ADA  was  used  by  MAKWLIST.ADA  to  build 
a  test  set  (list  of  .WAV  files  and  their  word  locations)  for  use  by  RECOGNIZ.M. 
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C.  DIPCOUNT.ADA 


__♦★***★*★******★**★****★**★★***★**★★★******★***********★*★**★★***★★*★*★ 

PROGRAM  NAME:  DipCount  (DipCount.ada)  ^ 

-**  AUTHOR:  Maj  Mark  Cantrell,  USMC  (cantrell@nps.navy.mil) 

--**  DESCRIPTION:  Program  for  reading  a  phonetic  dictionary  &  counting 

-**  diphones. 

** 

--**  DATE  MODIFIED:  3  July,  1 995  (see  below  for  version  number) 

^^★★★★★★★★★★★★★★★★★★★★★★♦★★★★★★★♦★★★★★★★******************************** 

**★**★*★★*★★****★************★**«***★***★**★*★**********★★**★******★*★ 


with  Arg, 

Text_IO: 

use  Text_IO; 

procedure  DipCount  is 

Max_Str_Len  :  constant  :=  255; 

Max_Phn_Len  :  constant  :=  5; 

CoLWidths  :  constant  :=  6; 

Usage_Error :  exception; 

Type  STR  is  record 
Txt :  String(1..Max_Str_Len); 

Len  :  Integer  range  O..Max_Str_Len  :=  0; 
end  record; 

Temp,Tail,Atom,Atom_Str,TaiLStr,Phon_Str :  STR; 

Type  Phoneme  is  (AA,AE,AH,AO,AW,AY,B,CH,D,DH,EH,ER,EY,  --  1st  3  lines  from  CMU  lex 
F,G,HH,IH,IY,JH,K,L,M,N,NG,OW,OY,P,R, 

S,SH,T,TH,UH,UW,V,W,Y,Z,ZH, 

EM,EN,ENG,EL,AX,IX.AXR,HX,  -  this  line  from  TIMIT 
XXX);  -  use  XXX  for  unknown  codes 

Diphone_Matrix  :  array(Phoneme,Phoneme)  of  natural  := 

(others  =>  (others  =>  0)); 


This_Phoneme,Last_Phoneme :  Phoneme; 

Debugging  :  boolean  :=  false; 

lnput_Filename,Output_FileName  :  string(1..Max_Str_Len)  :=  (others  => ' '); 
Name_Last :  Natural; 

lnput_Line :  string(1..Max_Str_Len); 
lnput_Last :  natural; 
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Version  :  constant  String  :=  "DipCount  Version  #1.0"; 
lnput_DataFile,Output_DataFile :  TextJO.File_Type: 

^jkit-k**********’k**icit*itit********'k*itik****it**i[1t*itit*it**ic*itifk*1t*ic*it'ti'k-kit**ifkifk-kifk* 

GET_ATOM  -  Gets  next  atom  from  a  str 

it*ifkiHt*****'k-k1fk*-kiiit*ifkifkit*-k*itit**1r*it1iitit****1fk*****-ie***it****flt***-k***itit****it* 


Procedure  Get_Atom{ln_Str :  Str;Atom_Str,TaiLStr :  out  Str)  is 
i,a_start,a_end,a_len,tjen :  integer  :=  1; 

Begin 

If  ln_Str.Len  >  0  then 

While  (ln_Str.Txt(i)  =  * ')  and  (i  <  ln_Str.Len)  loop 
i  :=  i  + 1; 
end  loop; 

If  ln_Str.Txt(i)/="then 

3  StBft  I—  i‘ 

While  (ln_Str.Txt(i)  /= '  ’)  and  (i  <  ln_Str.Len)  loop 
i  :=  i  +  1; 
end  loop; 

if  ln_Str.Txt(i)  =  ‘  ‘  then 
a_end  :=  i  - 1; 
else 

a_end  :=  i; 
end  if; 

ajen  :=  a_end  -  a_start  +  1; 
t_len  :=  ln_Str.Len  -  a_end; 

TaiLStr.Len  :=  tjen; 

Atom_Str.Len  :=  a_len; 
for  i  in  1  ..aJen  loop 

Atom_Str.Txt(i)  :=  ln_Str.Txt(a_start  +  i  - 1); 
end  loop; 

for  i  in  1  ..tJen  loop 

Tail_Str.Txt(i)  :=  ln_Str.Txt(a_end  +  i); 
end  loop; 

else 

Atom_Str.Len  :=  0; 

TaiLStr.Len  :=  0; 
end  if; 
else 

Atom_Str.Len  :=  0; 

TaiLStr.Len  :=  0; 
end  if; 

end  GeLAtom; 


GET_PHONEME  -  Returns  Phoneme  corresponding  to  an  atom  (atom  must  be  free 
-*  of  leading  &  trailing  blanks  &  other  garbage  (accent  digits  on  end  OK) 


Function  Get_Phoneme  (ln_Atom :  Str)  return  Phoneme  is 
i.Phn_End :  Integer; 

Phn_String  :  String  (1  ..Max_Phn_Len); 

OuLPhoneme :  Phoneme; 

Begin 

Phn_End  :=  ln_Atom.Len; 
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if  ln_Atom.Len  >  0  then 

While  ((ln_Atom.Txt(Phn_End)  in  '0'..'9'))  and  (Phn_End  >  1)  loop 
Phn_End  :=  Phn_End  - 1 ; 
end  loop; 
end  if; 

If  (Phn_End  >  0)  and  (Phn_End  <=  Max_Phn_Len)  then 
Phn_String(1..Phn_End)  :=  ln_Atom.Txt(1..Phn_End); 
begin  --  block  to  catch  constraint  error  if  not  a  valid  phoneme 
Out_Phoneme  :=  Phoneme‘Value(Phn_String(1  ..Phn_End)); 
return  Out_Phoneme; 
exception 

when  constraint_error  => 
return  XXX; 
end;  -  block 
else 

return  XXX; 
end  if; 

end  Get_Phoneme; 


^^*************W*******«*******4r*******4r************************************ 

-*  PUT_PHON  -  A  Put  routine  that  translates  from  my  enum 
-*  representations  to  standard  symbols 

******it*-k*****1t*1t*****-kitit**ii**ititit**ifkiftfk1fk***-kit1fkitifkifk‘tfk-k**it1HHt**it*it*****it 


Procedure  Put_Phon(0_File  :  File_Type;  Phon  :  Phoneme)  is 
begin 

If  Phon  =  HX  then 
Put(0_File,"H#”); 
elsif  Phon  =  XXX  then 
Put(0_File, "???“); 
else 

Put(0_File,Phoneme'image(Phon)); 
end  if; 

end  Put_Phon; 

__*******************«****«*4r******4r***********4r4r*************************** 

-*  DISPLAY_MATRIX  -  Prints  final  diphone  matrix  to  output  file  (in  2  pieces) 


Procedure  Display_Matrix  is 
row,col,mid_col :  Phoneme; 

Count :  natural  :=  0; 
begin 

Put  Lin0(Output  OdtsfilG  “*******************’****************************************♦************♦**♦***”' 
Put_Line(OutputlDatafile’"  DIPHONE  COUNTS  FOR  FILE  "  &  lnput_Filename); 

Put  LinG(Output  DdtQfilG  “***************♦*****************************************♦****★*****★★********★«’ 


mid_col  :=  Phoneme'val(integer((Phoneme'pos(Phoneme'last))/2)): 
new_line(Output_Datafile); 


-  left  half  of  matrix 

for  col  in  Phonemefirst..Mid_Col  loop  -  column  labels 
set_col(Output_Datafile,coLwidths*(1  +  Phoneme'pos(col))); 
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Put_Phon(Output_Datafile,Col); 
end  loop; 

new_line(Output_Datafile,2); 
for  row  in  Diphone_Matrix'range  loop 
Put_Phon(Output_Datafile,Row):  --  row  label 

for  col  in  Phonemefirst..Mid_Col  loop 

set_col(Output_Datafile,col_widths*{1  +  Phoneme'pos(col))); 
put(Output_Datafile,integer’image(Diphone_Matrix(row,col))); 

if  Diphone_Matrix(row,col)  >  0  then  --  count  nonzero  entries 
count  ;=  count  +  1 ; 
end  if; 

end  loop; 

new_line(Output_Datafile); 
end  loop; 

new_line(Output_Datafile,2); 


--  right  half  of  matrix 

for  col  in  Phoneme'succ(Mid_Col)..Phoneme'last  loop  --  col  labels 
set_col(Output_Datafile,col_widths*(Phoneme'pos(col)  -  Phoneme'pos(Mid_Col))); 
Put_Phon(Output_Datafile,col); 
end  loop; 

new_line(Output_Datafile,2); 
for  row  in  Diphone_Matrix'range  loop 
Put_Phon(Output_Datafile,Row);  -  row  label 
for  col  in  Phoneme'succ(Mid_Col)..Phoneme'last  loop 

set_col(Output_Datafile,coLwidths*(Phoneme'pos(col)  -  Phoneme'pos(Mid_Col))); 
put(Output_Datafile,integer’image(Diphone_Matrix(row,col))); 

if  Diphone_Matrix(row,col)  >  0  then  -  count  nonzero  entries 
count  :=  count  + 1 ; 
end  if; 


end  loop; 

new_line(Output_DatafiIe); 
end  loop; 


New_Line(Output_Datafile,3); 

Put  LinG(Output  DstsfilG  “*******************************************************************************“j. 
Put_Line(Output_Datafile,"  Total  Diphones  with  nonzero  entries  = "  &  integer'image(count)); 


Put_Line(Output_Datafile, 


): 


New_Line(Output_Datafile,3); 


--  Now  list  nonzero  entries 

Put_Line(Output_Datafile, "NONZERO  ENTRIES  ARE;"); 
New_Line(Output_Datafile); 

for  row  in  Diphone_Matrix'range  loop 
for  col  in  Phoneme'first.Phoneme'last  loop 
if  Diphone_Matrix{row,col)  >  0  then 
Put_Phon(Output_Datafile,Row); 
set_col(Output_Datafile,coLwidths); 
Put_Phon(Output_Datafile,Col); 
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set_col(Output_Datafile,(3*coLwidths)-(integer'image(Diphone_Matrix(row,col))'length)); 
put(Output_Datafile,integer‘image(Diphone_Matrix(row,col))); 
newJine(Output_Datafile); 
end  if; 
end  loop; 
end  loop; 

end  Display_Matrix; 

__************************************************************************** 

-*  GET_FILE  -  Prompts  user  for  filename,  opens  file,  &  returns  name  &  handle 


Procedure  Get_File(Prompt_String :  String: 

F_Name :  out  String; 

F_Mode :  File_Mode; 

F_Handle  :  in  out  File_Type)  is 
Str_Last :  Integer; 

Str_Text :  String(1..Max_Str_Len)  :=  (others  => ' '); 
begin 


loop  "  loop  until  valid  filename  entered  or  user  aborts  (^) 

Begin  -  block  statements  to  allow  another  try  if  bad  filename  entered 

Put(Prompt_String); 

Get_Line(Str_T  ext,Str_Last) ; 


if  F_Mode  =  ln_File  then 
Text_IO.Open(File  =>  F_Handle, 

Mode  =>  F_Mode, 

Name  =>  Str_Text(1..Str_Last)): 

else 

Text_IO.Create(File  =>  F_Handle, 

Mode  =>  F_Mode, 

Name  =>  Str_Text(1..Str_Last)); 

end  if; 

F_Name  :=  Str_Text; 


exception 

When  Name_Error  =>  Put(Ascii.Bel); 

Put_Line("***  Unable  to  open  “  &  Str_Text(1..Str_Last)); 
New_Line; 

When  others  =>  raise; 

end; 

end  loop; 
end  Get_File: 
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_ ★★*****♦★★★***★*★★*****★*★***★★★****★*★★★★******★***★**♦**★*******★★*★**★* 

-*  DIPCOUNT  Main  Program 

__**********************************★*************★*****★***♦*************** 


Begin  --  DipCount 

if  (Arg.Count  >  2)  and  then  (Arg.Data(3)  =  7d"  or  -  process  command  line 

Arg.Data(3)  =  7D"  or 
Arg.Data(3)  =  "-d"  or 
Arg.Data(3)  =  "-D"  or 
Arg.Data(3)  =  "d“  or 
Arg.Data(3)  =  “D")  then 

Debugging  :=  True; 

elsif  Arg.Count  >  2  then  --  note  that  program  name  is  1st  entry  in  array 
raise  Usage_Error: 
end  if; 

New_Line(25);  --  biank  lines  &  program  "banner" 

Put_Line  (Version); 

New_Line{2); 

if  (Arg.Count  >  1)  then  -  if  input  fiiename  on  cmd  line,  open  it.  eise  ask 
begin 

Text_IO.Open(Fiie  =>  lnput_DataFile, 

Mode  =>  ln_Fiie, 

Name  =>  Arg.Data(2)); 

lnput_Filename(1  ..Arg.Data(2)'length)  :=  Arg.Data(2); 
exception  -  handle  bad  filenames  from  command  line 

When  Name_Error  I 
Status_error  i 
Mode_error  i 
Use_error  I 

Device_error  =>  New_Line(3); 

Put_Line("***  Unable  to  open "  &  Arg.Data(2)); 

New_Line; 

raise  Usage_Error; 

end; 

eise 

Get_File("Enter  name  of  input  file: ", 
lnput_Filename, 
ln_File, 

lnput_Datafile); 

end  if; 


-  get  output  filename  from  user 
Get_File("Enter  name  of  output  file: ", 

Output_Filename,, 

Out_File, 

Output_Datafile); 


If  Debugging  then 
New_Line; 

Put_Line(" - [  PROCESSING  FILE  ] - "); 

end  if; 
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While  NOT  End_Of_File(lnput_DataFile)  loop 
Get_Line(lnput_DataFile,lnput_Line,lnput_Last); 

--  if  not  a  comment  line 

if  lnput_Last  >  0  and  then  lnput_Line(1)  in  'A'..'z'  then 
if  debugging  then 

Put_Line(lnput_Line(1..lnput_Last)); 
end  if; 

for  i  in  1  ..lnput_Last  loop  ~  remove  slashes,  don't  need  them, 
if  lnput_Line(i)  =  't  then 
lnput_Line(i)  := ' 
end  if; 
end  loop; 

Temp.Txt(1..lnput_Last)  :=  lnput_Line(1..lnput_Last); 

Temp.Len  :=  lnput_Last; 

Get_Atom(Temp, Atom, Tail);  --  get  &  discard  the  word 
if  Debugging  then 

Put_Line("Word  = "  &  Atom.Txt(1  ..Atom.Len)); 
end  if; 


Last_Phoneme  :=  HX;  --  Consider  silence  &  first  phoneme  to  be  a  diphone 

loop  --  for  each  phoneme  on  the  line  (in  the  word) 

Temp  :=Tail; 

Get_Atom(Temp, Atom, Tail); 
if  Atom.Len  >  0  then 

This_Phoneme  :=  Get_Phoneme(Atom); 

if  Debugging  then 
PuLLineC  Diphone  =  "  & 

Phoneme'lmage(Last_Phoneme)  & "/"  & 
Phoneme'lmage(This_Phoneme)); 

end  if; 


-  increment  counter  for  diphone  formed  by  this  &  prior  phoneme 
Diphone_Matrix(Last_Phoneme,This_Phoneme)  := 
Diphone_Matrix(Last_Phoneme,This_Phoneme)  + 1; 

Last_Phoneme  :=  This_Phoneme; 
end  if; 

exit  when  Tail.Len  <=  0; 
end  loop; 


if  Debugging  then 

Put_Line("  Diphone  = "  & 
Phoneme’lmage(Last_Phoneme)  & "/"  & 
Phoneme'lmage(HX)); 

end  if; 
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-  Consider  last  phoneme  &  silence  to  be  a  diphone 
Diphone_Matrix(Last_Phoneme,HX)  := 

Diphone_Matrix(Last_Phoneme,HX)  + 1 ; 


end  if; 
end  loop: 

Display_Matrix; 

If  Debugging  then 

Put_Line(" - [  DONE  ] - “); 

New_Line; 
end  if; 

Text_IO.Close(lnput_Datafile); 

Text_IO.CIose(Output_Datafile): 


exception 

When  Usage_Error  => 

Put(Ascii.Bel): 

New_Line(3); 

Put_Line("Usage:  DipCount  infilename  [-d]"); 
New_Line: 

Put_Line("  -d  =  enable  debugging"); 

New_Line; 

return; 


end  DipCount; 
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D.  DIPDURS.ADA 


__*********************************4r***«****************«*****«'*>******** 

-•*  PROGRAM  NAME:  DipDurs  (DipDurs.ada) 

— ** 

-••AUTHOR:  Maj  Mark  Cantrell,  USMC  (cantrell@nps.navy.mil) 

— ** 

-••  DESCRIPTION:  Program  for  reading  a  list  of  phonetic  transcription 
“**  files  &  recording  max  diphone  durations  listed 

-**  DATE  MODIFIED:  17  Sep,  1995  (see  below  for  version  number) 
__*★★*★*★*************★************************♦*★*******★***********««* 


with  Arg, 

Text_IO: 

use  TextJO; 

procedure  DipDurs  is 

Max_Str_Len  :  constant  :=  255; 

Max_Phn_Len :  constant  :=  5; 

CoLWidths  :  constant  :=  6; 

Fs  :  constant  :=  16000.0; 

FList_Filename  :  constant  string  :=  "DIPFLIST.TXT"; 

Usage_Error :  exception; 

Type  STR  is  record 
Txt :  String(1..Max_Str_Len); 

Len :  Integer  range  O..Max_Str_Len  :=  0; 
end  record; 

Temp,Tail,Atom,Atom_Str,TaiLStr,Phon_Str :  STR; 

Type  Phoneme  is  (AA,AE,AH,AO,AW,AY,B,CH,D,DH,EH,ER,EY,  -  1st  3  lines  from  CMU  lex 
F,G,HH,IH,IY,JH,K,L,M,N,NG,OW,OY,P,R, 

S,SH,T,TH,UH,UW,V,W,Y,Z,ZH, 

EM,EN,ENG,EL,AX,IX,AXR,HX,  -  this  line  from  TIMIT 
XXX);  -  use  XXX  for  unknown  codes 

Diphone_Matrix :  array(Phoneme,Phoneme)  of  natural  := 

(others  =>  (others  =>  0)); 


Max_Dip_X,Max_Dip_Y :  Phoneme  :=  HX; 

Max_Du  ration  :  Integer  :=  0; 

Max_Fname :  Str; 

Debugging :  boolean  :=  false; 

lnput_Filename,Output_FileName  :  string(1..Max_Str_Len)  :=  (others  => ' '); 
Name_Last :  Natural; 
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lnput_Line :  string(1  ..Max_Str_Len); 
lnput_Last :  natural; 

Version  :  constant  String  :=  "DipDurs  Version  #1 .0"; 
lnput_DataFile,Output_DataFile,FList_File :  Text_IO.File_Type: 


^jk***********************************************************************-^* 

UPPER_CASE  -  Returns  upper  case  version  of  input  string 

Function  Upper_Case(ln_String  :  string)  return  string  is 
Result_String :  String(ln_Stringfirst..ln_String'last): 

Begin 

for  index  in  ln_String'first..ln_String'last  loop 
If  ln_String(index)  in  'a'..‘z'  then 

Result_String{index)  :=  character’val(character‘pos(ln_String(index)) 
-  character‘pos('a') 

+  character'pos('A')); 

else 

Result_String(index)  :=  ln_String(index): 
end  if; 

end  loop; 

return  Result_String; 
end  Upper_Case; 


--*  GET_ATOM  -  Gets  next  atom  from  a  str 

^^*******'kk***i(k**-kii*it*****it*1t1t***it***kk*1Htitifkifkk*itifk*-k**ititk*it*itifkiHHt1t1tie1t*** 


Procedure  Get_Atom{ln_Str :  Str;Atom_Str,TaiLStr :  out  Str)  is 
i,a_start,a_end,a_len,tjen  :  integer  :=  1; 

Begin 

If  ln_Str.Len  >  0  then 

While  (ln_Str.Txt(i)  = ' ')  and  (i  <  ln_Str.Len)  loop 
i  :=  i  + 1 ; 
end  loop; 

If  ln_Str.Txt(i)  /= ' '  then 
a_start  :=  i; 

While  (ln_Str.Txt(i)  /= ' ')  and  (I  <  ln_Str.Len)  loop 
i  :=  i  +  1 ; 
end  loop; 

if  ln_Str.Txt(i)  =  ‘  ’  then 
a_end  :=  i  - 1 ; 
else 

a_end  :=  i; 
end  if; 

a_len  :=  a_end  -  a_start  + 1 ; 
t_len  :=  ln_Str.Len  -  a_end; 

TaiLStr.Len  :=  tjen; 

Atom_Str.Len  :=  a_len; 
for  i  in  1  ..ajen  loop 

Atom_Str.Txt(i)  :=  ln_Str.Txt(a_start  +  i  - 1); 
end  loop; 

for  i  in  1  ..tJen  loop 
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Tail_Str.Txt(i)  :=  ln_Str.Txt(a_end  +  i); 
end  loop; 

else 

Atom_Str.Len  :=  0; 

TaiLStr.Len  :=  0; 
end  if; 
else 

Atoni_Str.Len  :=  0; 

TaiLStr.Len  :=  0; 
end  if; 

end  Get_Atom; 

"*  GET_PHONEME  -  Returns  Phoneme  corresponding  to  an  atom  (atom  must  be  free 
of  ieading  &  trailing  blanks  &  other  garbage  (accent  digits  on  end  OK) 
__★*★**♦★*★***★♦****★★******★***★*★★★*******★★★♦★★******★****★*★**★********★ 


Function  Get_Phoneme  (ln_Atom :  Str)  return  Phoneme  is 
i,Phn_End :  Integer; 

Phn_String :  String  (1..Max_Phn_Len); 

Out_Phoneme :  Phoneme; 

Begin 

Phn_End  :=  ln_Atom.Len; 
if  ln_Atom.Len  >  0  then 

Whiie  ((ln_Atom.Txt(Phn_End)  in  '0'..'9'))  and  (Phn_End  >  1)  loop 
Phn_End  :=  Phn_End  - 1 ; 
end  loop; 
end  if; 

If  (Phn_End  >  0)  and  (Phn_End  <=  Max_Phn_Len)  then 
Phn_String(1..Phn_End)  :=  Upper_Case(ln_Atom.Txt(1..Phn_End)); 

begin  --  block  to  catch  constraint  error  if  not  a  vaiid  phoneme 
OuLPhoneme  :=  PhonemeValue(Phn_String(1  ..Phn_End)); 
return  OuLPhoneme; 
exception 

when  constrainLerror  => 

if  Phn_End  =  2  and  then  Phn_String(1..2)  =  "H#"  then 
return  HX; 
else 

return  XXX; 
end  if; 

end;  -  block 
else 

return  XXX; 
end  if; 

end  GeLPhoneme; 

-*  ECHO_DATA  -  Used  by  Process_Phn_File  to  echo  the  data  read  from  file 


Procedure  Echo_Data  (Start_N,  End_N  :  Long_lnteger; 

Phon_Str :  Str; 

Phoneme_Read :  Phoneme)  is 
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begin 


if  Debugging  then 

Put_Line(“  Start  = "  &  longJnteger'image(Start_N)); 
Put_Line("  End  = "  &  long_integer'image(End_N)): 

Put_Line("  Phoneme  &  Phon_Str.Txt(1  ..Phon_Str.Len)  & 

=  "  &  Phoneme'image(Phoneme_Read)); 

end  if; 

end  Echo_Data; 

PUT_PHON  -  A  Put  routine  that  translates  from  my  enum 
"*  representations  to  standard  symbols.  Used  by  Display_Matrix. 


Procedure  Put_Phon(0_File :  File_Type;  Phon :  Phoneme)  is 
begin 

If  Phon  =  HX  then 
Put(0_File,"Hr): 
elsif  Phon  =  XXX  then 
Put(0_File, “???'■): 
else 

Put(0_File,Phoneme'image(Phon)): 
end  if; 

end  Put_Phon; 


-*  DISPLAY_MATRIX  -  Prints  final  diphone  matrix  to  output  file  (in  2  pieces) 


Procedure  Display_Matrix  is 
row,col,mid_col :  Phoneme; 

Count :  natural  :=  0; 
begin 

Put  Lin6(Output  DsitdfilG  "***********************************************************’'^****************************‘'^- 
"PuLLineCOulpuLDat’afile,”  MAX  DIPHONE  DURATIONS  FOR  FILES  LISTED  IN  "  & 

lnput_Filename); 

Lin©(Output  DHtsfilo  *'***’*'************************************************************************************"y 
mid_col  :=  PhonemeVal(integer((Phoneme'pos(Phoneme'last))/2)); 
newJine(Output_Datafile); 


-  left  half  of  matrix 

for  col  in  Phonemefirst..Mid_Col  loop  --  column  labels 
set_col(Output_Datafile,col_widths‘(1  +  Phoneme'pos(col))); 
Put_Phon(Output_Datafile,Col); 
end  loop; 

new_llne(Output_Datafile,2); 
for  row  in  Diphone_Matrix'range  loop 
Put_Phon(Output_Datafile,Row);  -  row  label 
for  col  in  Phonemefirst..Mid_Col  loop 

set_col(Output_Datafile,coLwidths*(1  +  Phoneme'pos(col))); 
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put(Output_Datafile,integer‘image(Diphone_Matrix(row,col))); 

if  Diphone_Matrix(row,col)  >  0  then  --  count  nonzero  entries 
count  :=  count  + 1 ; 
end  if; 

end  loop; 

newJine(Output_Dataf  ile) ; 
end  loop; 

newJine(Output_Datafile,2); 


--  right  half  of  matrix 

for  col  in  Phoneme'succ(Mid_Col)..Phoneme'last  loop  ~  col  labels 
set_col(Output_Datafile,coLwidths*(Phoneme'pos(col)  -  Phoneme‘pos(Mid_Col))); 
Put_Phon(Output_Datafile,col); 
end  loop; 

new_line(Output_Datafile,2); 
for  row  in  Diphone_Matrix'range  loop 
Put_Phon(Output_Datafile,Row);  --  row  label 
for  col  in  Phoneme'succ(Mid_Col)..Phoneme'last  loop 

set_col(Output_Datafile,coLwidths*(Phoneme'pos(col)  -  Phoneme'pos(Mid_Col))); 
put(Output_Datafile,integer'image(Diphone_Matrix{row,col))); 

if  Diphone_Matrix(row,col)  >  0  then  -  count  nonzero  entries 
count  :=  count  +  1; 
end  if; 

end  loop; 

new_line(Output_Dataf  ile) ; 
end  loop; 

New_Line{Output_Datafile,3); 

Put  Lin6(Output  Ddtsf il6  “**********’*^*************************************************************^***************“j  • 
New_Line(Output_Dataf  ile) ; 

Put_Line(Output_Datafile,"  Total  Diphones  with  nonzero  entries  =  "  &  integer'image(count)); 

New_Line(Output_Datafile) ; 

Put_Line(Output_Datafile,"  Maximum  diphone  duration  data:"); 

Put(Output_Datafile,"  Diphone  = "); 

Put_Phon(Output_Datafile,Max_Dip_X); 

Put(Output_Datafile,"/''); 

Put_Phon(Output_Datafile,Max_Dip_Y); 

New_Line(Output_Dataf  ile) ; 

Put_Line(Output_Datafile,"  Duration  (samples)  = "  &  integer'image(Max_Duration)); 
Put(Output_Datafile,"  Duration  (milliseconds)  = "); 

Put(Output_Datafile,integer'image(integer(1 000.0*(float(Max_Duration)/Fs)))); 

New_Line(Output_Dataf  ile) ; 

Put_Line(Output_Datafile,"  Found  in  file: "  &  Max_Fname.Txt(1  ..Max_FName.Len)); 
New_Line(Output_Datafile) ; 

Put__Line(Output_Datafile,“*************’*'**************************************************************************"j- 

New_Line(Output_Datafile,3); 


--  Now  list  nonzero  entries 
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Put_Line(Output_Datafile, "NONZERO  ENTRIES  ARE:"); 

New_Line(Output_Datafile); 

for  row  in  Diphone_Matrix’range  loop 
for  col  in  Phoneme'first..Phoneme‘last  loop 
if  Diphone_Matrix(row,col)  >  0  then 
Put_Phon(Output_Datafile,Row); 
set_col(Output_Datafile,coLwidths): 

Put_Phon(Output_Datafile,Col); 

set_col(Output_Datafile,(3*coLwidths)-(integer'image(Diphone_Matrix(row,col))'length)); 
put(Output_Datafile,integer'image(Diphone_Matrix(row,col))); 
newJine{Output_Datafile); 
end  if; 
end  loop; 
end  loop; 

end  Display_Matrix; 

GET_FILE  -  Prompts  user  for  filename,  opens  file,  &  returns  name  &  handle 

^_W******W**W*******************************************ilr******************^ 


Procedure  Get_File(Prompt_String :  String; 

F_Name :  out  String; 

F_Mode :  File_Mode; 

F_Handle  :  in  out  FileJType)  is 
Str_Last :  Integer; 

Str_Text :  String(1..Max_Str_Len)  :=  (others  =>  ' '); 
begin 


loop  -  loop  until  valid  filename  entered  or  user  aborts  (-^C) 

Begin  -  block  statements  to  allow  another  try  if  bad  filename  entered 

Put(Prompt_String): 

Get_Line(Str_Text,Str_Last); 

if  F_Mode  =  ln_File  then 
Text_IO.Open(File  =>  F_Handle, 

Mode  =>  F_Mode, 

Name  =>  Str_Text(1..Str_Last)); 

else 

Text_IO.Create(File  =>  F_Handle, 

Mode  =>  F_Mode, 

Name  =>  Str_Text(1  ..Str_Last)); 

end  if; 

F_Name  :=  Str_Text; 


exception 

When  Name_Error  =>  Put(Ascii.Bel); 

Put_Line("***  Unable  to  open "  &  Str_Text(1..Str_Last)): 
New_Line; 

When  others  =>  raise; 
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end; 

end  loop; 
end  Get_File; 


^^************************'****4r**«>**«********4r4r*********4r******************* 

-*  RECORD_DURATION  -  Used  by  Process_Phn_File  to  record  a  diphone's  duration 
“*  in  Diphone_Matrix.  Durations  of  HX  (silence)  &  XXX  (unknown)  not  counted 

__^***********************W************************************************** 


Procedure  Record_Duration  (Start_N,  Mid_N,  End_N  :  Long_lnteger; 

First_Phon,Second_Phon :  Phoneme; 
Phon_Filename :  Str)  is 

Dip_Duration :  Integer; 
m_Sec :  Integer; 

LocaLFName :  Str; 

Valid_Dip :  Boolean  :=  false;  -  only  valid  dips  count  for  meix  duration 
begin 

if  (First_Phon  =  HX)  or  (First_Phon  =  XXX)  then 
Dip_Duration  :=  integer(End_N  -  Mid_N); 
elsif  (Second_Phon  =  HX)  or  (Second_Phon  =  XXX)  then 
Dip_Duration  :=  integer(Mid_N  -  Start_N); 
else 

Dip_Duration  :=  integer(End_N  -  Start_N); 

Valid_Dip  :=  True; 
end  if; 

if  Dip_Duration  >  Diphone_Matrix(First_Phon,Second_Phon)  then 
Diphone_Matrix(First_Phon,Second_Phon)  :=  Dip_Duration; 
end  if; 

if  Debugging  then 
Put_Line("Diphone  = "  & 

Phoneme'lmage(First_Phon)  & "/"  & 
Phoneme'lmage(Second_Phon)  & 

“  duration  =  “  & 
integer’image(Dip_Duration)); 

New_Line; 
end  if; 


--  if  new  global  max,  record  it 
If  Valid_Dip  and  (Dip_Duration  >  Max_Duration)  then 
Max_Duration  :=  Dip_Duration; 

Max_Dip_X  :=  First_Phon; 

Max_Dip_Y  :=  Second_Phon; 

Max_Fname  :=  Phon_Filename; 
end  if; 

end  Record_Duration; 


^^********************************w********************.******^^*********^**^* 

“*  PROCESS_PHN_FILE  -  Opens  designated  file  &  records  diphone  durations 

_^*****************w************************^t****^^*****************^****^*^^ 


Procedure  Process_Phn_File  (Phn_Filename  :  string)  is 
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Phn_File :  TextJO.File_Type; 

Last_PhnAtom,Phn_FName :  Str; 

This_Start,This_End,Last_Start,Last_End,Closure_Start,Closure_End :  longjnteger; 
Closure_Pending  :  Boolean  :=  False; 

Phn_Llne :  string(1..Max_Str_Len): 

Phn_Last :  natural; 

This_Phoneme,Last_Phoneme,Closure_Phoneme :  Phoneme; 

Closure_String ;  String(1  ..3); 


begin 


-  set  up  filename  str  for  Record_Duration  calls 
Phn_FName.Txt(1  ..Phn_Filename'length)  :=  Phn_Filename; 
Phn_FName.Len  :=  Phn_Filename'length; 


Text_IO.Open(File  =>  Phn_File, 

Mode  =>  ln_File, 

Name  =>  Phn_Filename); 

getJine(Phn_File,Phn_Line,Phn_Last); 

if  Phn_Last  >  5  then 


if  debugging  then 
Put_Line(Phn_Line(1  ..Phn_Last)); 
end  if; 

Temp.Txt(1..Phn_Last)  :=  Upper_Case(Phn_Line(1..Phn_Last)): 
Temp.Len  :=  Phn_Last; 

Get_Atom(Temp, Atom, Tail);  --  get  the  start  N 
Last_Start  :=  LongJnteger'value(Atom.Txt(1..Atom.Len)): 

Temp  :=Tail; 

Get_Atom(Temp, Atom, Tail);  --  get  the  ending  N 
Last_End  ;=  Long_lnteger'value(Atom.Txt(1..Atom.Len)); 

Temp  ;=Tail; 

Get_Atom(Temp,Atom,Tail);  --  get  the  phoneme  string 
Last_Phoneme  :=  Get_Phoneme(Atom); 

Last_PhnAtom  :=  Atom; 

Echo_Data  (Last_Start,Last_End,Atom,Last_Phoneme); 

-  in  case  file  starts  with  closure 
If  Atom.Len  =  3  and  then 

((Atom.Txt(1..Atom.Len)  =  "BCL")  or 
(Atom.Txt(1  ..Atom.Len)  =  "DCL”)  or 
(Atom.Txt(1.. Atom.Len)  =  “GCL")  or 
(Atom.Txt(1..Atom.Len)  =  "PCL")  or 
(Atom.Txt(1  ..Atom.Len)  =  "TCL")  or 
(Atom.Txt(1  ..Atom.Len)  =  "KCL")  or 
(Atom.Txt(1..Atom.Len)  =  “DCL”)  or 
(Atom.Txt(1..Atom.Len)  =  “TCL"))  then 

Closure_Pending  :=True; 
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Closure_Start  :=  LasLStart; 

Closure_End  :=  Last_End; 

Closure_String  :=  Atom.Txt(1  ..Atom.Len); 

Last_Phoneme  :=  HX;  --  ignore  the  "last"  values 

If  Debugging  then 

Put_Line("  (Stop  closure  will  be  combined  with  stop's  duration)"); 
end  if; 

end  if; 

While  NOT  End_Of_File(Phn_File)  loop 


get_line(Phn_File,Phn_Line,Phn_Last); 
if  Phn_Last  >  5  then 
if  debugging  then 

Put_Line("  New  line  =  "  &  Phn_Line(1..Phn_Last)); 
end  if; 

Temp.Txl(1..Phn_Last)  :=  Upper_Case(Phn_Line(1..Phn_Last)); 
Temp.Len  :=  Phn_Last; 

Get_Atom(Temp, Atom, Tail);  --  get  the  start  N 
This_Start  :=  LongJntegerValue(Atom.Txt(1.. Atom.Len)); 

Temp  :=Tail; 

Get_Atom(Temp, Atom, Tail);  --  get  the  ending  N 
This_End  :=  Long_lnteger'value(Atom.Txt(1..Atom.Len)); 

Temp  :=Tail; 

Get_Atom(Temp,Atom,Tail);  -  get  the  phoneme  string 
This_Phoneme  :=  Get_Phoneme(Atom); 

Echo_Data(This_Start,This_End,Atom,This_Phoneme); 

If  Closure_Pending  then 

-  if  this  is  stop  to  match  pending  closure 
If  ((Closure_String  =  "BCL")  and  (This_Phoneme  =  B))  or 
((Closure_String  =  "DCL")  and  (This_Phoneme  =  D))  or 
((Closure_String  =  "GCL")  and  (This_Phoneme  =  G))  or 
((Closure_String  =  "PCL")  and  ^his_Phoneme  =  P))  or 
((Closure_String  =  "TCL")  and  (This_Phoneme  =  T))  or 
((Closure_String  =  “KCL")  and  (This_Phoneme  =  K))  or 
((Closure_String  =  "DCL")  and  (This_Phoneme  =  JH))or 
((Closure_String  =  "TCL")  and  (This_Phoneme  =  CH))  then 

--  record  duration  from  last  phoneme,  thru  closure  to  this 
--  phoneme's  end.  next  diphone  starts  with  CLOSURE'S  start 

Record_Duration(Last_Start,Last_End,This_End, 

Last_Phoneme,This_Phoneme,Phn_Fname); 

Last_Start  :=  Closure_Start; 
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Last_End  :=  This_End; 

Last_Phoneme  :=  This_Phoneme; 

Last_PhnAtom  :=  Atom; 

else  --  else  closure  was  not  followed  by  matching  stop 

~  record  duration  from  last  phoneme,  thru  closure's  end 
"  as  one  diphone  &  from  closure's  start  thru  this 
-  phoneme's  end  as  another,  next  diphone  starts  with  this 
--  phoneme's  start 

Temp.Txt(1)  :=  Closure_String(1): 

Temp.Len  :=  1 ; 

Closure_Phoneme  :=  Get_Phoneme(Temp); 

Record_Duration(Last_Start,Last_End,Closure_End, 

Last_Phoneme,Closure_Phoneme,Phn_Fname); 

Record_Duration(Closure_Start,Closure_End,This_End, 

Closure_Phoneme,This_Phoneme,Phn_Fname); 

Last_Start  :=  This_Start; 

Last_End  :=  This_End; 

Last_Phoneme  :=  This_Phoneme; 

Last_PhnAtom  :=  Atom; 

end  if; 

Closure_Pending  :=  false; 

--  see  if  THIS  is  a  closure 
elsif  Atom.Len  =  3  and  then 

((Atom.Txt(1..Atom.Len)  =  "BCL")  or 
(Atom.Txt(1  ..Atom.Len)  =  "DCL")  or 
(Atom.Txt(1  ..Atom.Len)  =  "GCL")  or 
(Atom.Txt(1..Atom.Len)  =  "PCL")  or 
(Atom.Txt(1  ..Atom.Len)  =  "TCL")  or 
(Atom.Txt(1..Atom.Len)  =  "KCL")  or 
(Atom.Txt(1  ..Atom.Len)  =  "DCL")  or 
(Atom.Txt(1..Atom.Len)  =  "TCL"))  then 

If  Debugging  then 

Put_Line(''  (Stop  closure  will  be  combined  with  stop's  duration)"); 
end  if; 

Closure_Pending  :=  True; 

Closure_Start  :=  This_Start; 

Closure_End  :=  This_End; 

Closure_String  :=  Atom.Txt(1.. Atom.Len); 

else  --  handle  simple  case  where  no  closures  involved 

Record_Duration(Last_Start,Last_End, 

This_End,Last_Phoneme,This_Phoneme,Phn_Fname): 

Last_Start  :=  This_Start; 
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Last_End  :=  This_End; 

Last_Phoneme  :=  This_Phoneme: 

Last_PhnAtom  :=  Atom; 

end  if; 

Echo_Data  (Last_Start,Last_End,Last_PhnAtom,Last_Phoneme); 
end  if; 
end  loop; 


--  handle  case  where  last  phoneme  was  a  closure 
if  Ciosure_Pending  then 

--  closure  was  not  followed  by  matching  stop 

--  record  duration  from  last  phoneme,  thru  closure's  end 
“  as  one  diphone 

Temp.Txt(1)  :=  Closure_String(1); 

Temp.Len  :=  1; 

Closure_Phoneme  :=  Get_Phoneme(Temp); 

Record_Duration(Last_Start,Last_End,Closure_End, 

Last_Phoneme,Closure_Phoneme,Phn_Fname); 


end  if; 


end  if; 

Close(Phn_File); 

exception 

When  Name_Error  I  Use_Error  I  Device_Error  => 

New_Line; 

Put_Line{"***  Unable  to  open  PHN  file  "  &  Phn_Filename); 
New_Line; 


end  Process_Phn_File; 


__*w****w************w*wwww***************************************^^**^****^ 

DipDurs  Main  Program 


Begin  -  DipDurs 

if  (Arg.Count  >  2)  and  then  (Arg.Data(3)  =  7d"  or  -  process  command  line 

Arg.Data(3)  =  "/D"  or 
Arg.Data{3)  =  "-d“  or 
Arg.Data(3)  =  "-D"  or 
Arg.Data(3)  =  "d"  or 
Arg.Data(3)  =  “D")  then 

Debugging  :=  True; 
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elsif  Arg.Count  >  2  then  -  note  that  program  name  is  1st  entry  in  array 
raise  Usage_Error; 
end  if; 

New_Line(25);  --  blank  lines  &  program  "banner" 

Put_Line(Version); 

New_Line(2): 

if  (Arg.Count  >  1)  then  --  if  input  filename  on  cmd  line,  open  it.  else  ask 
begin 

Text_IO.Open(File  =>  lnput_DataFile, 

Mode  =>  ln_File, 

Name  =>  Arg.Data(2)); 

lnput_Filename(1..Arg.Data(2)'length)  :=  Arg.Data(2); 
exception  --  handle  bad  filenames  from  command  line 

When  Name_Error  I 
Status_error  I 
Mode_error  I 
Use_error  I 

Device_error  =>  New_Line(3): 

Put_Line("***  Unable  to  open  “  &  Arg.Data(2)); 

New_Line; 

raise  Usage_Error: 

end; 

else 

Get_File("Enter  name  of  input  file: ", 
lnput_Filename, 
ln_File, 

lnput_Datafile); 

end  if; 


~  get  output  filename  from  user 
Get_File("Enter  name  of  output  file: ", 

Output_Filename, 

Out_File, 

Output_Datafile); 

If  Debugging  then 
New_Line; 

Put_Line(" - [  PROCESSING  FILE  ] - ") 

end  if; 

While  NOT  End_Of_File(lnput_DataFile)  loop 
Get_Line(lnput_DataFile,lnput_Line,lnput_Last); 
lnput_Line  :=  Upper_Case(lnput_Line); 
if  lnput_Last  >  0  then 
If  Debugging  then 

Put_Line{"#### "  &  lnput_Line(1..lnput_Last)  &  "  ####"); 
end  if; 

Process_Phn_File(lnput_Line(1..lnput_Last)); 
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If  Debugging  then 

Put_Line(“###########################################"); 
end  if; 

end  if; 

end  loop; 

Display_Matrix; 

If  Debugging  then 

Put_Line(" - [  DONE  ] - "); 

New_Line; 
end  if; 

TextJO.CIose(lnput_Datafile); 

TextJO.CIose(Output_Dataf  ile); 

exception 

When  Usage_Error  => 

Put(Ascii.Bel); 

New_Line{3); 

Put_Line("Usage:  DipDurs  infilename  [-d]"); 

New_Line; 

Put_Line(”  -d  =  enable  debugging"); 

New_Line; 

return; 


end  DipDurs; 


E.  DIPNUMS.ADA 


^^********************************************************************** 

♦★*★*****★★****★***★*★♦★***★****★★**★*★******★*★★**★**★*★★*★*★******** 

-**  PROGRAM  NAME:  DipNums  (DipNums.ada) 

** 

-**  AUTHOR:  Maj  Mark  Cantrell,  USMC  (cantrell@nps.navy.mil) 

— ** 

--**  DESCRIPTION:  Program  for  reading  a  list  of  phonetic  transcription 
--**  files  &  recording  how  many  times  each  diphone  was  found 
_ 

--**  DATE  MODIFIED:  30  Sep,  1995  (see  below  for  version  number) 

__*****★*★*★*★★★★*★*★*♦*★★*★***★★**★★********★*********★*★**★★**♦******* 

_^********************************************************************** 


with  Arg, 

TextJO; 

use  TextJO; 

procedure  DipNums  is 

Max_Str_Len  :  constant  :=  255; 

Max_Phn_Len  :  constant  :=  5; 

CoLWidths  :  constant  :=  6; 

Fs  :  constant  :=  16000.0; 

FList_Filename  :  constant  string  :=  "DIPFLIST.TXT"; 
DipList_Filename :  constant  string  :=  “DIPLIST.TXT”; 

Usage_Error :  exception; 

Type  STR  is  record 
Txt :  String(1..Max_Str_Len); 

Len  :  Integer  range  O..Max_Str_Len  :=  0; 
end  record; 

Temp,Tail,Atom,Atom_Str,Tail_Str,Phon_Str :  STR; 


Type  Phoneme  is  (AA,AE,AH,AO,AW,AY,B,CH,D,DH,EH,ER,EY,  -  1st  3  lines  from  CMU  lex 
F,G,HH,IH,IY,JH,K,L,M,N,NG,OW,OY,P,R, 

S,SH,T,TH,UH,UW,V,W,Y,Z.ZH, 

EM,EN,ENG,EL,AX,IX,AXR,HX,  -  this  line  from  TIMIT 
XXX);  --  use  XXX  for  unknown  codes 

Diphone_Matrix :  array(Phoneme, Phoneme)  of  natural  := 

(others  =>  (others  =>  0)); 


Dip_Num_Needed :  integer  :=  0; 

Debugging  :  boolean  :=  false; 
Recording  :  Boolean  :=  false; 
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lnput_Filename,Output_FileName :  string(1  ..Max_Str_Len)  :=  (others  => ' '); 
Name_Last :  Natural; 

lnput_Line :  string(1..Max_Str_Len): 
lnput_Last :  natural; 

Version :  constant  String  :=  "DipNums  Version  #1.0"; 
lnput_DataFiie,Output_DataFile,FList_File,DipList_OutFiie :  Text_IO.File_Type; 


UPPER_CASE  -  Returns  upper  case  version  of  input  string 

Function  Upper_Case(ln_String  :  string)  return  string  is 
Resuit_String :  String(ln_String'first..ln_String'last); 

Begin 

for  index  in  ln_String'first..ln_String'last  loop 
If  ln_String{index)  in  'a’..'z’  then 

Result_String(index)  :=character‘val(character‘pos(ln_String(index)) 
-  character’pos('a') 

+  character'pos('A')); 

else 

Result_String(index)  :=  ln_String(index); 
end  if; 

end  loop; 

return  Result_String; 
end  Upper_Case; 

__*************★★**★**★★*★★*****★***★**★*★★★★*★***♦*******************★***** 
GET_ATOM  -  Gets  next  atom  from  a  str 


Procedure  Get_Atom(ln_Str :  Str;Atom_Str,TaiLStr :  out  Str)  is 
i,a_start,a_end,ajen,tjen  :  integer  :=  1 ; 

Begin 

If  ln_Str.Len  >  0  then 

While  (ln_Str.Txt(i)  =  ’ ')  and  (i  <  ln_Str.Len)  loop 
i  :=  i  + 1 ; 
end  loop; 

If  ln_Str.Txt(i)  /=  ‘ '  then 
a_start  :=  i; 

While  (ln_Str.Txt(i)  /= ' ')  and  (i  <  ln_Str.Len)  loop 
i:=i  +  1; 
end  loop; 

if  ln_Str.Txt(i)  =  ’ '  then 
a_end  :=  i  - 1 ; 
else 

a_end  :=  i; 
end  if; 

ajen  :=  a_end  -  a_start  +  1; 
tjen  :=  ln_Str.Len  -  a_end; 

TaiLStr.Len  :=  tJen; 

Atom_Str.Len  :=  a_len; 
for  i  in  1  ..a_len  loop 

Atom_Str.Txt{i)  :=  ln_Str.Txt(a_start  +  i  - 1); 
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end  loop; 

for  i  in  1  ..tjen  loop 
TaiLStr.Txt(i)  :=  ln_Str.Txt(a_end  + 1); 
end  loop; 

else 

Atom_Str.Len  :=  0; 

TaiLStr.Len  :=  0; 
end  If; 
else 

Atom_Str.Len  :=  0; 

TaiLStr.Len  :=  0; 
end  if; 

end  GeLAtom; 

_ ★♦★★*★***★★********★★*★★★**★★★***★★★★*★*****★★**★**★**★★★**★**★******★**** 

GET_PHONEME  -  Returns  Phoneme  corresponding  to  an  atom  (atom  must  be  free 
of  leading  &  trailing  blanks  &  other  garbage  (accent  digits  on  end  OK) 


Function  Get_Phoneme  (ln_Atom :  Str)  return  Phoneme  is 
i,Phn_End :  Integer; 

Phn_String :  String  (1..Max_Phn_Len); 

Out_Phoneme :  Phoneme; 

Begin 

Phn_End  :=  ln_Atom.Len; 
if  ln_Atom.Len  >  0  then 

While  ((ln_Atom.Txt(Phn_End)  in  '0'..'9'))  and  (Phn_End  >  1)  loop 
Phn_End  :=  Phn_End  - 1 ; 
end  loop; 
end  if; 

If  (Phn_End  >  0)  and  (Phn_End  <=  Max_Phn_Len)  then 
Phn_String(1  ..Phn_End)  :=  Upper_Case(ln_Atom.Txt(1..Phn_End)); 

begin  -  block  to  catch  constraint  error  if  not  a  valid  phoneme 
OuLPhoneme  :=  Phoneme'Value(Phn_String(1..Phn_End)); 
return  OuLPhoneme; 
exception 

when  constraint_error  => 

if  Phn_End  =  2  and  then  Phn_String(1  ..2)  =  "H#“  then 
return  HX; 
else 

return  XXX; 
end  if; 

end;  -  block 
else 

return  XXX; 
end  if; 

end  GeLPhoneme; 


ECHO_DATA  -  Used  by  Process_Phn_File  to  echo  the  data  read  from  file 

__*****'****4r*********'**W***************4r^******************4r  «'*****«********* 


Procedure  Echo_Data  (Start_N,  End_N  :  Long_lnteger; 

Phon_Str :  Str; 
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Phoneme_Read :  Phoneme)  is 


begin 

if  Debugging  then 

Put_Line("  Start  = "  &  longJnteger'image(Start_N)): 
Put_Line("  End  = "  &  longJnteger'image(End_N)): 
Put_Line("  Phoneme  &  Phon_Str.Txt(1  ..Phon_Str.Len)  & 
= "  &  Phoneme'image(Phoneme_Read)); 

end  if; 

end  Echo_Data: 


-*  PUT_PHON  -  A  Put  routine  that  translates  from  my  enum 
representations  to  standard  symbols.  Used  by  Display_Matrix. 


Procedure  Put_Phon(0_File  :  File_Type:  Phon  :  Phoneme)  is 
begin 

If  Phon  =  HX  then 
Put(0_File,"H#"): 
elsif  Phon  =  XXX  then 
Put(0_File, "???"): 
else 

Put(0_File,Phoneme'image(Phon)); 
end  if; 

end  Put_Phon; 


^_*****************************************************^^*****************.*** 

-*  DISPLAY_MATRIX  -  Prints  final  diphone  matrix  to  output  file  (in  2  pieces) 


Procedure  Display_Matrix  is 
row,col,mid_col :  Phoneme; 
Count :  natural  :=  0; 
maxed_count :  natural  :=  0; 
begin 


Put  Lin6(Output  Datsfil©,"*************************'*'*************************************************************"^' 
Put_Line(OuTput_Datafile,"  MAX  DIPHONE  DURATIONS  FOR  FILES  LISTED  IN  "  ’& 

lnput_Filename); 

Pijt_Lin6(0utput  Datafil©,"******************’*********************************************************************"^ 


mid_col  :=  Phoneme'val(integer((Phoneme'pos(Phoneme'last))/2)); 
newJine(Output_Datafile); 


--  left  half  of  matrix 

for  col  in  Phonemefirst..Mid_Col  loop  —  column  labels 
set_col(Output_Datafile,coLwidths*(1  +  Phoneme'pos(col))); 
Put_Phon(Output_Datafile,Col); 
end  loop; 


187 


newJine(Output_Datafile,2); 
for  row  in  Diphone_Matrix'range  loop 
Put_Phon(Output_Datafile,Row);  --  row  label 
for  col  in  Phoneme1irst..Mid_Col  loop 

set_col(Output_Datafile,col_wiclths*(1  +  Phoneme'pos(col))): 
put(Output_Datafile,integer'image(Diphone_Matrix(row,col))): 

if  Diphone_Matrix(row,col)  >  0  then  --  count  nonzero  entries 
if  (Row  /=  XXX)  and  (Col  /=  XXX)  and 
{Diphone_Matrix(row,col)  >=  Dip_Num_Needed)  then 
maxed_count  :=  maxed_count  +  1 ; 
Put_Phon(DipList_Outfile,row); 

Put(DipList_Outfile,' '); 

Put_Phon(DipList_Outfile,col): 

New_Line(DipList_Outfile); 
end  if; 

count  :=  count  + 1 ; 
end  if; 

end  loop; 

newJine(Output_Datafile); 
end  loop; 

new_line(Output_Datafile,2); 


“  right  half  of  matrix 

for  col  in  Phoneme'succ(Mid_Col)..Phoneme'last  loop  -  col  labels 
set_col(Output_Datafile,col_widths*(Phoneme'pos(col)  -  Phoneme'pos(Mid_Col))); 
Put_Phon(Output_Datafile,col); 
end  loop; 

newJine(Output_Datafile,2); 
for  row  in  Diphone_Matrix'range  loop 
Put_Phon(Output_Datafile,Row);  --  row  label 
for  col  in  Phoneme'succ(Mid_Col)..Phoneme'last  loop 

set_col(Output_Datafile,col_widths*(Phoneme‘pos(col)  -  Phoneme'pos(Mid_Col))); 
put(Output_Datafile,integer'image(Diphone_Matrix(row,col))); 

if  Diphone_Matrix(row,col)  >  0  then  --  count  nonzero  entries 
if  (Row  /=  XXX)  and  (Col  /=  XXX)  and 
(Diphone_Matrix(row,col)  >=  Dip_Num_Needed)  then 
maxed_count  :=  maxed_count  +  1 ; 

Put_Phon(DipList_Outfile,row); 

Put(DipList_Outfile,' '); 

Put_Phon(DipList_Outfile,col); 

New_Line(DipList_Outfile); 
end  if; 

count  :=  count  + 1 ; 
end  if; 


end  loop; 

newJine(Output_Datafile); 
end  loop; 


New_Line(Output_Datafile,3); 


Lin©(Output  Qsitcifil©  "****************************************************************************************» j 
New_Line(Output_Datafile); 
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Total  Recognized  Diphones  with  nonzero  entries  =  “  & 


Total  Recognized  Diphones  with  requested  number  of  files  = "  & 


--  Now  list  nonzero  entries 

Put_Line(Output_Datafile, “NONZERO  ENTRIES  ARE:"); 

New_Line(Output_Datafile); 

for  row  in  Diphone_Matrix'range  loop 
for  col  in  Phoneme'first..Phoneme'last  loop 
if  Diphone_Matrix(row,col)  >  0  then 
Put_Phon(Output_Datafile,Row): 
set_col(Output_Datafile,coLwidths): 

Put_Phon(Output_Datafile,Col); 

set_col(Output_Datafile,(3*coLwidths)-(integer'image(Diphone_Matrix(row,col))'length)); 
put(Output_Datafile,integer‘image(Diphone_Matrix(row,col))); 
newJine(Output_Dataf  ile) ; 
end  if; 
end  loop; 
end  loop; 


Put_Line{Output_Datafile," 
integer'image(count)); 
New_Line(Output_Dataf  ile) ; 
Put_Line(Output_Datafile," 
integer'image(maxed_count)); 
New_Une(Output_Datafile) ; 

Put_Line(Output_Datafile, "******' 

New_Line(Output_Datafile,3); 


end  Display_Matrix; 


GET_FILE  -  Prompts  user  for  filename,  opens  file,  &  returns  name  &  handle 
_ *********************★*★***★**★**★★*★★**★★*****★*****★*★*★*★★************* 


Procedure  Get_File(Prompt_String :  String; 

F_Name :  out  String; 

F_Mode :  File_Mode; 

F_Handle  :  in  out  File_Type)  is 
Str_Last :  Integer; 

Str_Text :  String(1..Max_Str_Len)  :=  (others  => ' '); 
begin 


loop  --  loop  until  valid  filename  entered  or  user  aborts  {'^) 

Begin  --  block  statements  to  allow  another  try  if  bad  filename  entered 

Put(P  rompt_Stri  ng) ; 

Get_Line(Str_Text,Str_Last); 


if  F_Mode  =  ln_File  then 
Text_IO.Open(File  =>  F_Handle, 
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Mode  =>  F_Mode, 

Name  =>  Str_Text(1  ..Str_Last)); 

else 

Text_IO.Create(File  =>  F_Handle, 

Mode  =>  F_Mode, 

Name  =>  Str_Text(1  ..Str_Last)); 

end  if; 

F_Name  :=  Str_Text; 
exit; 
exception 

When  Name_Error  =>  Put(Ascii.Bei); 

Put_Line("***  Unable  to  open "  &  Str_Text(1  ..Str_Last)); 
New_Line; 

When  others  =>  raise; 

end; 

end  loop; 
end  Get_File; 


*****♦***★★****★★*★*•*★*★*♦*★★*****★*★★*******•*************★★************** 

“*  Record_Diphone  -  Used  by  Process_Phn_File  to  record  a  dIphone's  duration 
in  Diphone_Matrix.  Durations  of  HX  (silence)  &  XXX  (unknown)  not  counted 
_ ********★****★***★*♦★**********★**********************★**♦★*★*★★***★♦***★* 

Procedure  Record_Diphone  (Start_N,  Mid_N,  End_N  :  Long_lnteger; 

First_Phon,Second_Phon :  Phoneme; 

Phon_Filename :  Str)  is 

LocaLFName :  Str; 

Valid_Dip :  Boolean  :=  true;  --  only  valid  dips  count 
begin 

if  (First_Phon  =  XXX)  or  (Second_Phon  =  XXX)  then 
Valid_Dip  :=  False; 
end  if; 

if  recording  and  Valid_Dip  then 
LocaLFname  :=  Phon_Filename; 

If  (LocaLFname.Len  >  4)  and  then 

(LocaLFName.Txt((LocaLFname.Len-3)..LocaLFname.Len))  =  ".PHN"  then 
LocaLFName.Txt((LocaLFname.Len-3).. LocaLFname.Len)  :=  ".WAV“; 
Put(FList_File,Local_FName.Txt(1.. LocaLFname.Len)  &  " "); 

Put(FList_File,Phoneme’lmage(First_Phon)  &  &  Phoneme'lmage(Second_Phon)  & " "); 

Put(FList_File,long_integer'image(Start_N)  & " "); 
Put(FList_File,longJnteger'image(Mid_N)  & " "); 
Put_Line(FList_File,long_integer'image(End_N)); 
end  if; 
end  if; 


Diphone_Matrix(First_Phon,Second_Phon)  :=  Diphone_Matrix(First_Phon,Second_Phon)  +  1 ; 
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if  Debugging  then 
Put_Line("  Diphone  =  “  & 
Phoneme'image(First_Phon)  & & 
Phoneme'l  mage(Second_Phon)) ; 
New_Line: 
end  if; 

end  Record_Diphone: 


__***it*****-k^t*-k*’k************iiitit****it*ifkit1t1t1tifk**ifk**iric*it*-k*ifkii'k'k*****i^**it1rit1( 

PROCESS_PHN_FILE  -  Opens  designated  file  &  records  diphone  durations 

__**4r***-*****4r4r*4r***i^***********4r**********«*:kik  ***************************** 

Procedure  Process_Phn_File  (Phn_Filename  :  string)  is 

Phn_Fiie :  TextJO.File_Type; 

Last_PhnAtom,Phn_FName :  Str; 

This_Start,This_End,Last_Start,Last_End,Ciosure_Start,Closure_End :  iong_integer; 
Ciosure_Pending  :  Booiean  :=  False; 

Phn_Line :  string(1..Max_Str_Len); 

Phn_Last :  natural; 

This_Phoneme,Last_Phoneme,Closure_Phoneme :  Phoneme; 

Closure_String :  String(1..3); 


begin 


--  set  up  filename  str  for  Record_Diphone  calls 
Phn_FName.Txt(1  ..Phn_Filename'length)  :=  Phn_Filename; 
Phn_FName.Len  :=  Phn_Filename'length; 


Text_IO.Open(File  =>  Phn_File, 

Mode  =>  ln_File, 

Name  =>  Phn_Filename); 


get_line(Phn_File,  Phn_Line,  Phn_Last) ; 


if  Phn_Last  >  5  then 


if  debugging  then 
Put_Line(Phn_Line(1  ..Phn_Last)); 
end  if; 

Temp.Txt(1..Phn_Last)  :=  Upper_Case(Phn_Line{1..Phn_Last)); 
Temp.Len  :=  Phn_Last; 

Get_Atom(Temp, Atom, Tail);  --  get  the  start  N 
Last_Starl  :=  Long_lnteger'value(Atom.Txt(1..Atom.Len)); 

Temp  ;=Tail; 

Get_Atom(Temp, Atom, Tail);  --  get  the  ending  N 
Last_End  :=  Long_lntegerValue(Atom.Txt{1..Atom.Len)); 

Temp  :=Tail; 

Get_Atom{Temp,Atom,Tail);  --  get  the  phoneme  string 
Last_Phoneme  :=  Get_Phoneme(Atom); 

Last_PhnAtom  :=  Atom; 
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Echo_Data  (Last_Start,Last_End,Atom,Last_Phoneme); 

-  in  case  file  starts  with  closure 
If  Atom.Len  =  3  and  then 

((Atom.Txt(1..Atom.Len)  =  "BCL")  or 
(Atom.Txt(1.. Atom.Len)  =  “DCL")  or 
(Atom.Txt(1.. Atom.Len)  =  "GCL")  or 
(Atom.Txt(1  ..Atom.Len)  =  "PCL")  or 
(Atom.Txt(1..Atom.Len)  =  "TCL")  or 
(Atom.Txt(1.. Atom.Len)  =  "KCL")  or 
(Atom.Txt(1.. Atom.Len)  =  “DCL")  or 
(Atom.Txt(1  ..Atom.Len)  =  "TCL"))  then 

Closure_Pending  :=  True; 

Closure_Start  :=  Last_Start; 

Closure_End  :=  Last_End; 

Closure_String  :=  Atom.Txt(1  ..Atom.Len); 

Last_Phoneme  :=  HX;  --  ignore  the  "last"  values 

If  Debugging  then 

Put_Line("  (Stop  closure  will  be  combined  with  stop's  duration)"); 
end  if; 

end  if; 

While  NOT  End_Of_File(Phn_File)  loop 
getJine(Phn_File,Phn_Line,Phn_Last); 
if  Phn_Last  >  5  then 
if  debugging  then 

Put_Line("  New  line  = "  &  Phn_Line(1  ..Phn_Last)); 
end  if; 

Temp.Txt(1..Phn_Last)  :=  Upper_Case(Phn_Line(1..Phn_Last)); 
Temp.Len  :=  Phn_Last; 

Get_Atom(Temp, Atom, Tail);  -  get  the  start  N 
This_Start  :=  Long_lnteger’value(Atom.Txt(1  ..Atom.Len)); 

Temp  :=  Tail; 

Get_Atom(Temp, Atom, Tail);  --  get  the  ending  N 
This_End  :=  LongJnteger'value(Atom.Txt(1  ..Atom.Len)); 

Temp  :=Tail; 

Get_Atom(Temp,Atom,Tail);  --  get  the  phoneme  string 
This_Phoneme  :=  Get_Phoneme(Atom); 

Echo_Data  (This_Start,This_End,Atom,This_Phoneme); 


If  Closure_Pending  then 

-  if  this  is  stop  to  match  pending  closure 
If  ((Closure_String  =  "BCL")  and  (This_Phoneme  =  B))  or 
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((Closure_String  =  "DCL")  and  (This_Phoneme  =  D))  or 
((Closure_String  =  “GCL“)  and  (This_Phoneme  =  G))  or 
((Closure_String  =  "PCL")  and  ^his_Phoneme  =  P))  or 
((Closure_String  =  “TCL")  and  (This_Phoneme  =  T))  or 
({Closure_String  =  "KCL")  and  {This_Phoneme  =  K))  or 
((Closure_String  =  "DCL")  and  (This_Phoneme  =  JH))or 
((Closure_String  =  "TCL")  and  (This_Phoneme  =  CH))  then 

--  record  duration  from  last  phoneme,  thru  closure  to  this 
--  phoneme's  end.  next  diphone  starts  with  CLOSURE'S  start 

Record_Diphone(Last_Start,Last_End,This_End, 

Last_Phoneme,This_Phoneme,Phn_Fname); 

Last_Start  :=  Closure_Start: 

Last_End  :=  This_End: 

Last_Phoneme  :=  This_Phoneme; 

Last_PhnAtom  :=  Atom; 

else  --  else  closure  was  not  followed  by  matching  stop 

--  record  duration  from  last  phoneme,  thru  closure's  end 

-  as  one  diphone  &  from  closure's  start  thru  this 

"  phoneme's  end  as  another,  next  diphone  starts  with  this 

-  phoneme's  start 

Temp.Txt(1)  :=  Closure_String(1): 

Temp.Len  :=  1; 

Closure_Phoneme  :=  Get_Phoneme(Temp); 

Record_Diphone(Last_Start,Last_End,Closure_End, 

Last_Phoneme,Closure_Phoneme,Phn_Fname): 

Record_Diphone(Closure_Start,Closure_End,This_End, 

Closure_Phoneme,This_Phoneme,Phn_Fname); 

Last_Start  :=  This_Start: 

Last_End  ;=  This_End; 

Last_Phoneme  :=  This_Phoneme; 

Last_PhnAtom  :s=  Atom; 

end  if; 

Closure_Pending  :=  false; 

-  see  if  THIS  is  a  closure 
elsif  Atom.Len  =  3  and  then 

((Atom.Txt(1..Atom.Len)  =  "BCL")  or 
(Atom.Txt(1  ..Atom.Len)  =  "DCL")  or 
(Atom.Txt(1.. Atom.Len)  =  "GCL")  or 
(Atom.Txt(1  ..Atom.Len)  =  "PCL")  or 
(Atom.Txt(1..Atom.Len)  =  TCL")  or 
(Atom.Txt(1..Atom.Len)  =  "KCL")  or 
(Atom.Txt(1.. Atom.Len)  =  "DCL")  or 
(Atom.Txt(1..Atom.Len)  =  "TCL"))  then 
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If  Debugging  then 

Put_Line("  (Stop  closure  will  be  combined  with  stop's  duration)"); 
end  if; 

Closure_Pending  :=  True; 

Closure_Start  :=  This_Start; 

Closure_End  :=  This_End; 

Closure_String  :=  Atom.Txt(1..Atom.Len); 

else  "  handle  simple  case  where  no  closures  involved 

Record_Diphone(Last_Start,Last_End, 

This_End,Last_Phoneme,This_Phoneme,Phn_Fname); 

Last_Start  :=  This_Start; 

Last_End  :=  This_End; 

Last_Phoneme  :=  This_Phoneme; 

Last_PhnAtom  :=  Atom; 

end  if; 

Echo_Data  (Last_Starl,Last_End,Last_PhnAtom,Last_Phoneme); 
end  if; 
end  loop; 


--  handle  case  where  last  phoneme  was  a  closure 
if  Closure_Pending  then 

“  closure  was  not  followed  by  matching  stop 

--  record  duration  from  last  phoneme,  thru  closure’s  end 
-  as  one  diphone 

Temp.Txt(l)  :=  Closure_String(1); 

Temp.Len  :=  1; 

Closure_Phoneme  :=  Get_Phoneme(Temp); 

Record_Diphone(Last_Start,Last_End,Closure_End, 

Last_Phoneme,Closure_Phoneme,Phn_Fname); 


end  if; 


end  if; 

Close(Phn_File); 

exception 

When  Name_Error  I  Use_Error  I  Device_Error  => 

New_Line; 

Put_Line("***  Unable  to  open  PHN  file  "  &  Phn_Filename); 
New_Line; 
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end  Process_Phn_File; 


_ ★***★******★★★★★*★******★*★★*★★**★***★***★★**★★★★★★★★****★**★*★***★**★★★★★ 

-*  DipNunrjs  Main  Program 

_^_^it******************************************************************^****** 

Begin  --  DipNums 

if  (Arg.Count  >  2)  and  then  (Arg.Data(3)  =  7d"  or  --  process  command  line 

Arg.Data(3)  =  “/D"  or 
Arg.Data(3)  =  "-d"  or 
Arg.Data(3)  =  "-D"  or 
Arg.Data(3)  =  "d"  or 
Arg.Data(3)  =  "D")  then 

Debugging  :=  True; 

elsif  Arg.Count  >  2  then  -  note  that  program  name  is  1st  entry  in  array 
raise  Usage_Error; 
end  if; 

New_Line(25):  -  blank  lines  &  program  "banner" 

Put_Line(Version); 

New_Line(2); 

if  (Arg.Count  >  1)  then  -  if  input  filename  on  cmd  line,  open  it.  else  ask 
begin 

Text_IO.Open(File  =>  lnput_DataFile, 

Mode  =>  ln_File, 

Name  =>  Arg.Data(2)); 

lnput_Filename(1..Arg.Data(2)'length)  :=  Arg.Data(2); 
exception  -  handle  bad  filenames  from  command  line 

When  Name_Error  I 
Status_error  I 
Mode_error  I 
Use_error  I 

Device_error  =>  New_Line(3); 

Put_Line(”***  Unable  to  open "  &  Arg.Data(2)): 

New_Line: 

raise  Usage_Error; 

end; 

else 

Get_File("  Enter  name  of  input  file; ", 
lnput_Filename, 
ln_File, 

lnput_Datafile); 

end  if; 

--  get  output  filename  from  user 
Get_File("Enter  name  of  output  file: ", 

Output_Filename, 

Out_File, 

Output_Datafile); 


--  get  number  of  files  to  record  per  dIphone 
loop  -  loop  until  valid  integer  entered  or  user  aborts  {'O) 


Begin  --  block  statements  to  allow  another  try  if  bad  filename  entered 


Put("How  many  instances  of  each  diphone  required  (0  if  none)? "); 
Get_Line(lnput_Line,lnput_Last); 

Dip_Num_Needed  :=  integer‘value(input_line(1  ..inputjast)); 

If  Dip_Num_Needed  >  0  then 

Text_IO.Create(File  =>  FList_File, 

Mode  =>  Out_File, 

Name  =>  FList_Filename); 

Text_IO.Create(File  =>  DipList_OutFile, 

Mode  =>  Out_File, 

Name  =>  DipList_Filename); 

Recording  :=  True; 

else 

Recording  :=  false; 
end  if; 
exit; 
exception 

When  constraint_Error  =>  Put(Ascii.Bel); 

Put_Line("***  You  must  enter  an  integer  (0  to  cancel  recording)  ***"); 
New_Line; 

When  others  =>  raise; 

end; 

end  loop; 

If  Debugging  then 
New_Line; 

Put_Line(“ - [  PROCESSING  FILE  ] - "); 

end  if; 

While  NOT  End_Of_File(lnput_DataFile)  loop 
Get_Line(lnput_DataFile,  lnput_Line,  lnput_Last); 
lnput_Line  :=  Upper_Case(lnput_Line); 
if  lnput_Last  >  0  then 
If  Debugging  then 

Put_Line("####  “  &  lnput_Line(1..lnput_Last)  &  "  ####"); 
end  if; 

Process_Phn_File(lnput_Line(1..lnput_Last)); 

If  Debugging  then 

Put_Line("###########################################”); 
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end  if; 
end  if; 


end  loop; 

Display_Matrix; 

If  Debugging  then 

Put_Line(" - [  DONE] - ■■); 

New_Line; 
end  if; 

Text_IO.CIose(lnput_Datafile); 

T  ext_IO.CIose(Output_Datafile); 

If  Recording  then 
T  ext_IO.CIose(FList_File); 

T  ext_IO.CIose(DipList_Outf  ile); 
end  if; 

exception 

When  Usage_Error  => 

Put(Ascii.Bel); 

New_Line(3); 

Put_Line(“Usage:  DipNums  infilename  [-d]“); 

New_Line; 

Put_Line("  -d  =  enable  debugging"); 

New_Line; 

return; 

end  DipNums; 
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F.  MAKFLIST.ADA 


_*****★★*********★★***************************************★*★*★******** 

-**  PROGRAM  NAME:  MakFList  (MakFList.ada) 

--*•  AUTHOR:  Maj  Mark  Cantrell,  USMC  (cantrell@nps.navy.mil) 

— ** 

--**  DESCRIPTION:  Program  for  reading  a  list  of  TIMIT  WAV/diphone 
--**  files  &  generating  a  new  list  with  N  occurences  of  each  diphone 
“**  contained  in  DIPLIST.TXT. 

_ ★* 

--**  DATE  MODIFIED:  3  Oct,  1995  (see  below  for  version  number) 

— ** 

it****it*ir**ifkirit*i(it*itick*ifk*ifk****it****-k*1fk*****-k*1t***it*it**1t**it1t*itifk*ifk** 

★♦★★★★★★★★★★♦★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★it*********************** 


with  Arg, 

TextJO; 

useTextJO; 

procedure  MakFList  is 

Max_Str_Len  :  constant  :=  255; 

Max_Phn_Len  :  constant  :=  5; 

CoLWidths  :  constant  :=  6; 

Fs :  constant  :=  16000.0; 

DipList_Filename  :  constant  string  :=  "DIPLIST.TXT"; 

Usage_Error ;  exception; 

Type  STR  is  record 
Txt :  String(1..Max_Str_Len); 

Len :  Integer  range  O..Max_Str_Len  :=  0; 
end  record; 

Temp,Tail,Atom,Atom_Str,Tail_Str,Phon_Str :  STR; 


Type  Phoneme  is  (AA,AE,AH,AO,AW,AY,B,CH,D,DH,EH,ER,EY,  -  1st  3  lines  from  CMU  lex 
F,G,HH,IH,IY,JH,K,L,M,N,NQ,OW,OY,P,R, 

S,SH,T,TH,UH,UW,V,W,Y,Z,ZH, 

EM,EN,ENG,EL,AX,IX,AXR,HX,  -  this  line  from  TIMIT 
XXX);  -  use  XXX  for  unknown  codes 

Diphone_Matrix ;  array(Phoneme,Phoneme)  of  natural  := 

(others  =>  (others  =>  integer’Iast)); 


Dip_Rec_Limit :  integer  :=  0; 

Debugging :  boolean  :=  false; 
Recording  :  Boolean  :=  false; 
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lnput_Filename,Output_FileName  :  string(1..Max_Str_Len)  :=  (others  => '  ’); 
Name_Last :  Natural; 

lnput_Line :  string(1  ..Max_Str_Len); 
lnput_Last :  natural; 

Version :  constant  String  :=  "MakFList  Version  #1 .0"; 
lnput_DataFile,Output_DataFile,DipList_lnFile :  Text_IO.File_Type; 

UPPER_CASE  -  Returns  upper  case  version  of  input  string 

Function  Upper_Case(ln_String :  string)  return  string  is 
Result_String :  String(ln_String1irst..ln_String'last); 

Begin 

for  index  in  ln_String'first..ln_String'last  loop 
If  ln_String(index)  in  'a'..'2'  then 

Result_String(index)  :=  character’val(character‘pos(ln_String(index)) 

-  character'pos('a') 

+  character’pos('A')); 

else 

Result_String(index)  :=  ln_String(index); 
end  if; 

end  loop; 

return  Result_String; 
end  Upper_Case; 

__*******************  **4r***4r«*4r*4r**«***«r*****************-**********1^******«* 

“*  GET_ATOM  -  Gets  next  atom  from  a  str 


Procedure  Get_Atom(ln_Str :  Str;Atom_Str,Tail_Str :  out  Str)  is 
i,a_start,a_end,ajen,tjen  :  integer  :=  1 ; 

Begin 

If  ln_Str.Len  >  0  then 

While  (ln_Str.Txt(i)  = ' ')  and  (i  <  ln_Str.Len)  loop 
i  :=  i  +  1; 
end  loop; 

If  ln_Str.Txt(i)  /=  ’  ’  then 
3  st3rt  i’ 

While  (ln_Str.Txt{i)  /= '  ’)  and  (i  <  ln_Str.Len)  loop 
i  :=  i  +  1 ; 
end  loop; 

if  ln_Str.Txt(i)  =  "then 
a_end  :=  i  - 1 ; 
else 

a_end  :=  i; 
end  if; 

ajen  :=  a_end  -  a_start  + 1 ; 
tjen  :=  ln_Str.Len  -  a_end; 

TaiLStr.Len  :=  t_len; 

Atom_Str.Len  :=  aJen; 
for  i  in  1  ..ajen  loop 
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Atom_Str.Txt(i)  :=  ln_Str.Txt(a_start  +  i  - 1): 
end  loop; 

for  i  in  1  ..Men  loop 

TaiLStr.Txt(i)  :=  ln_Str.Txt(a_end  +  i); 
end  loop; 

else 


Atom_Str.Len  :=  0; 
TaiLStr.Len  :=  0; 
end  if; 
else 

Atom_Str.Len  :=  0; 
TaiLStr.Len  :=  0; 
end  if; 

end  GeLAtom; 


********************-**************«*******************«******************* 

"*  GET_PHONEME  -  Returns  Phoneme  corresponding  to  an  atom  (atom  must  be  free 
of  leading  &  trailing  blanks  &  other  garbage  (accent  digits  on  end  OK) 


Function  Get_Phoneme  (ln_Atom :  Str)  return  Phoneme  is 
i,Phn_End :  Integer; 

Phn_String :  String  (1..Max_Phn_Len): 

OuLPhoneme :  Phoneme; 

Begin 

Phn_End  :=  ln_Atom.Len; 
if  ln_Atom.Len  >  0  then 

While  ((ln_Atom.Txt(Phn_End)  in  '0'..’9'))  and  (Phn_End  >  1)  loop 
Phn_End:=Phn_End-1; 
end  loop; 
end  if; 

If  (Phn_End  >  0)  and  (Phn_End  <=  Max_Phn_Len)  then 
Phn_String(1  ..Phn_End)  :=  Upper_Case(ln_Atom.Txt(1..Phn_End)); 

begin  --  block  to  catch  constraint  error  if  not  a  valid  phoneme 
OuLPhoneme  :=  Phoneme'Value(Phn_String(1..Phn_End)); 
return  OuLPhoneme; 
exception 

when  constrainLerror  => 

if  Phn_End  =  2  and  then  Phn_String(1..2)  =  "H#"  then 
return  HX; 
else 

return  XXX; 
end  if; 

end;  --block 
else 

return  XXX; 
end  if; 

end  GeLPhoneme; 

^^★★***********************************************************^************ 

-*  PUT_PHON  -  A  Put  routine  that  translates  from  my  enum 
-*  representations  to  standard  symbols.  Used  by  Display_Matrix. 


200 


Procedure  Put_Phon(0_File  :  File_Type;  Phon  :  Phoneme)  is 
begin 

If  Phon  =  HX  then 
Put(0_File,"H#"): 
elsif  Phon  =  XXX  then 
Put(0_File, "???"): 
else 

Put(0_File,Phoneme'image(Phon)): 
end  if; 

end  Put_Phon; 


GET_FILE  -  Prompts  user  for  filename,  opens  file,  &  returns  name  &  handle 

A************************************************************************* 


Procedure  Get_File(Prompt_String :  String; 

F_Name :  out  String; 

F_Mode :  File_Mode; 

F_Handle :  in  out  File_Type)  is 
Str_Last :  Integer; 

Str_Text :  String(1..Max_Str_Len)  :=  (others  => ' '); 
begin 


loop  --  loop  until  valid  filename  entered  or  user  aborts 
Begin  --  block  statements  to  allow  another  try  if  bad  filename  entered 

Put(Prompt_String); 

Get_Line(Str_T  ext,Str_Last) ; 


if  F_Mode  =  ln_File  then 
TextJO.Open(File  =>  F_Handle, 

Mode  =>  F_Mode, 

Name  =>  Str_Text(1..Str_Last)); 

else 

Text_IO.Create(File  =>  F_Handle, 

Mode  =>  F_Mode, 

Name  =>  Str_Text(1..Str_Last)); 

end  if; 

F_Name  :=  Str_Text; 


exception 

When  Name_Error  =>  Put(Ascii.Bel); 

Put_Line(“***  Unable  to  open "  &  Str_Text(1..Str_Last)); 
New_Line; 

When  others  =>  raise; 

end; 

end  loop; 
end  Get_File; 
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****★★****★******★**************★★★★★****★★★*★**★***★★★****★★★★**★♦★★*★*★* 
“*  Record_Diphone  -  Output  the  line  if  this  diphone  still  needed 

A************************************************************************* 


Procedure  Record_Diphone  (File_Line :  String; 

First_Phon,Second_Phon :  Phoneme)  is 

Valid_Dip :  Boolean  :=  true;  --  only  valid  dips  count 

begin 

if  (First_Phon  =  XXX)  or  (Second_Phon  =  XXX)  then 
Valid_Dip  :=  False; 
end  if; 

if  recording  and  Valid_Dip  and 

(Diphone_Matrix(First_Phon,Second_Phon)  <  Dip_Rec_Limit)  then 
Put_Line(Output_Datafile,File_Line); 

Diphone_Matrix(First_Phon,Second_Phon)  :=  Diphone_Matrix(First_Phon,Second_Phon)  + 1; 
end  if; 

if  Debugging  then 
Put_Line("Diphone  = "  & 

Phoneme'lmage(First_Phon)  & & 

Phoneme'lmage(Second_Phon)); 

New_Line; 
end  if; 

end  Record_Diphone; 

ititit****'k*ie*1t***-kie**icit-k*ieit-k***1fk*1fk*******it*******1t****’k*it**1t***it‘kieie*ime*it** 

PROCESS_LINE  -  Process  line  from  file  list  &  output  if  that  diphone  is 
still  needed 

Procedure  Process_Line(File_Line  :  string)  is 
Hyphen_Pos  :  natural  :=  0; 

First_Phn_Str,Last_Phn_Str,Line_Str,Tail.Temp :  Str; 

Row, Col :  Phoneme; 

begin 

Line_Str.Txt(1..File_Line'length)  :=  File_Line; 

Line_Str.Len  :=  File_Line'length; 

Get_Atom(Line_Str, Temp, Tail);  --  fetch  and  discard  filename 

Temp  :=  Tail; 

Get_Atom(Temp,First_Phn_Str,Tail);  --  fetch  diphone  string 

Last_Phn_Str  :=  First_Phn_Str;  --  store  diphone  in  both  str's  &  then 

-  remove  one  phoneme  from  each 
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For  i  in  1..First_Phn_Str.Len  loop  --  determine  where  first  phoneme  ends 
if  (First_Phn_Str.Txt(i)  =  or  (First_Phn_Str.Txt(i)  =  '/■)  then 
Hyphen_Pos  :=  i; 
end  if; 
end  loop; 

First_Phn_Str.Len  :=  Hyphen_Pos  - 1;  --  “remove"  2nd  phoneme  from  tail  of  1st 

--  remove  1st  phoneme  from  front  of  2nd 
For  i  in  Hyphen_Pos+1  ..Last_Phn_Str.Len  loop 
Last_Phn_Str.Txt(i-Hyphen_Pos)  :=  Last_Phn_Str.Txt(i); 
end  loop; 

Last_Phn_Str.Len  :=  Last_Phn_Str.Len  -  Hyphen_Pos; 

Row  :=  Get_Phoneme(First_Phn_Str); 

Col  :=  Get_Phoneme(Last_Phn_Str); 


Record_Diphone(File_Line,Row,Col); 
end  Process_Line; 


-*  INIT_MATRIX  -  Set  matrix  cells  to  zero  if  that  diphone  is  listed  in 
"*  diplist.txt.  That  allows  file  lines  with  those  diphones  to  be  output. 

Procedure  lnit_Matrix  is 

lnput_Line :  String(1  ..Max_Str_Len); 
lnput_Last :  natural; 

First_Phn_Str,Last_Phn_Str,Line_Str, Tail, Temp  :  Str; 

Row, Col :  Phoneme; 

Begin 

Text_IO.Open(File  =>  DipListJnFile, 

Mode  =>  ln_File, 

Name  =>  DipList_Filename); 

While  NOT  End_Of_File(DipList_lnFile)  loop 

Get_Line(DipList_lnFile,lnput_Line,lnput_last); 

Line_Str.Txt(1..lnput_Last)  :=  lnput_Line(1..lnput_Last); 
Line_Str.Len  :=  lnput_Last; 

Get_Atom(Line_Str,First_Phn_Str,Tail);  --  fetch  first  phoneme  str 
Get_Atom(Tail,Last_Phn_Str,Temp);  --  fetch  last  phoneme  str 


Row  :=  Get_Phoneme(First_Phn_Str); 
Col  :=  Get_Phoneme{Last_Phn_Str); 

Diphone_Matrix(row,col)  :=  0; 


203 


end  loop; 
end  lnit_Matrix: 


-*  MakFList  Main  Program 


Begin  --  MakFList 

if  (Arg.Count  >  2)  and  then  (Arg.Data(3)  =  7d"  or  --  process  command  line 

Arg.Data(3)  =  "/D"  or 
Arg.Data(3)  =  "-d"  or 
Arg.Data(3)  =  "-D"  or 
Arg.Data(3)  =  "d"  or 
Arg.Data(3)  =  "D")  then 

Debugging  :=  True; 

elsif  Arg.Count  >  2  then  -  note  that  program  name  is  1st  entry  in  array 
raise  Usage_Error; 
end  if; 

New_Line(25);  -  blank  lines  &  program  "banner" 

Put_Line(Version); 

New_Line(2); 

if  (Arg.Count  >  1 )  then  --  if  input  filename  on  cmd  line,  open  it.  else  ask 
begin 

TextJO.Open(File  =>  lnput_DataFile, 

Mode  =>  ln_File, 

Name  =>  Arg.Data(2)); 

lnput_Filename(1..Arg.Data{2)'length)  :=  Arg.Data(2); 
exception  ~  handle  bad  filenames  from  command  line 

When  Name_Error  I 
Status_error  I 
Mode_error  I 
Use_error  I 

Device_error  =>  New_Line(3); 

Put_Line{"***  Unable  to  open  "  &  Arg.Data(2)); 

New_Line; 

raise  Usage_Error; 

end; 

else 

Get_File("Enter  name  of  input  file: ", 
lnput_Filename, 
ln_File, 

lnput_Datafile); 

end  if; 


-  get  output  filename  from  user 
Get_File("Enter  name  of  output  file: ", 

Output_Filename, 

Out_File, 

Output_Datafile); 
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--  get  number  of  files  to  record  per  diphone 
loop  "  loop  until  valid  integer  entered  or  user  aborts  (^C) 

Begin  --  block  statements  to  allow  another  try  if  bad  filename  entered 

PutC'How  many  filenames  required  for  each  diphone?  “); 
Get_Line(lnput_Line,lnput_Last); 

Dip_Rec_Limit  :=  integer'value(input_line(1..input_last)); 

If  Dip_Rec_Limit  >  0  then 

lnit_Matrix; 

Recording  :=True: 

else 

raise  constraint_error; 
end  if; 
exit; 
exception 

When  constraint_Error  =>  Put(Ascii.Bel); 

Put_Line("***  You  must  enter  a  nonzero  integer  ***“); 
New_Line; 

When  others  =>  raise; 

end; 

end  loop; 


If  Debugging  then 
New_Line; 

Put_Line(" - [  PROCESSING  FILE  ] - "); 

end  if; 

While  NOT  End_Of_File(lnput_DataFile)  loop 
Get_Line(lnput_DataFile,lnput_Line,lnput_Last): 
lnput_Line  :=  Upper_Case(lnput_Line); 
if  lnput_Last  >  0  then 
If  Debugging  then 

Put_Line(“####  "  &  lnput_Line(1..lnput_Last)  &  "  ####"); 
end  if; 

Process_Line(lnput_Line(1..lnput_Last)): 

If  Debugging  then 

Put_Line("###########################################“); 
end  if; 

end  if; 
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end  loop; 


If  Debugging  then 

Put_Line(" - [  DONE  ] - “); 

New_Line; 
end  if; 

TextJO.CIose(lnput_DatafiIe); 

TextJO.CIose(Output_Datafile); 

TextJO.CIose(DipList_lnfile); 

exception 

When  Usage_ErrDr  => 

Put(Ascii.Bei); 

New_Line(3); 

Put_Line("Usage:  MakFList  infilename  [-d]"); 

New_Line; 

Put_Line("  -d  =  enable  debugging"); 

New_Line; 

return; 

end  MakFList; 
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G.  VOCAB.ADA 


_ **★*♦♦★*★**♦★*★★**★★★★★★★★***♦*★★*♦**♦**★************★***★★*******★*** 

********************************************************************** 

-**  PROGRAM  NAME:  Vocab  (vocab.ada) 

“**  AUTHOR:  Maj  Mark  Cantrell,  USMC  (cantrell@nps.navy.mil) 

— ** 

”**  DESCRIPTION:  Program  for  reading  a  phonetic  dictionary  &  creating  a 
”**  text  file  list  of  words  in  that  dictionary  that  are  composed  of 
-**  diphones  in  DIPLIST.TXT. 

— ** 

“**  DATE  MODIFIED:  8  Mar,  1996  (see  below  for  version  number) 

— ** 


with  Arg, 

Text_IO: 

use  Text_IO: 

procedure  Vocab  is 

Version  :  constant  String  :=  "Vocab  Version  #3.0"; 

DipList_FileName :  constant  String  :=  "diplistbct"; 

Max_Str_Len :  constant  :=  255; 

Max_Phn_Len :  constant  :=  5; 

CoLWidths :  constant  :=  6; 

Usage_Error :  exception; 

Type  STR  is  record 
Txt :  String(1..Max_Str_Len); 

Len  :  Integer  range  O..Max_Str_Len  :=  0; 
end  record; 

Temp,Tail,Atom,Word.Atom_Str,Tail_Str,Phon_Str :  STR; 


Type  Phoneme  is  (AA,AE,AH,AO,AW,AY,B,CH,D,DH,EH,ER,EY,  --  1st  3  lines  from  CMU  lex 
F,G,HH,IH,IY,JH,K,L,M,N,NG,OW,OY,P,R, 

S,SH,T,TH,UH,UW,V,W,Y,Z,ZH, 

EM,EN,ENG,EL,AX,IX,AXR,HX,  --  this  line  from  TIMIT 
XXX);  -  use  XXX  for  unknown  codes 

Diphone_Matrix :  array(Phoneme,Phoneme)  of  natural  := 

(others  =>  (others  =>  0)); 


This_Phoneme,Last_Phoneme :  Phoneme; 
Debugging :  boolean  :=  false; 
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lnput_Filename,Output_FileName :  string(1  ..Max_Str_Len)  :=  (others  => ' '); 
Name_Last :  Natural; 

lnput_Line :  string(1..Max_Str_Len); 
lnput_Last :  natural; 


lnput_DataFile,Output_DataFile,DipList_lnFile :  Text_IO.File_Type; 

ln_Vocab :  boolean; 

Splice_Count :  natural; 

Splicing :  Boolean; 

Splice_Array ;  array(1..Max_Str_Len,1..2)  of  Phoneme; 

Mid_Splice_Enable ;  Boolean  :=  false;  --  toggled  on  to  allow  mid-word  splicing 

^_*********w*******w***w**********************ww*ww**********w*******www«*ww 

-*  GET_ATOM  -  Gets  next  atom  from  a  str 


Procedure  Get_Atom(ln_Str :  Str;Atom_Str,Tail_Str :  out  Str)  is 
i,a_start,a_end,a_len,tjen  :  integer  :=  1 ; 

Begin 

If  ln_Str.Len  >  0  then 

While  (ln_Str.Txt(i)  = ' ')  and  (i  <  ln_Str.Len)  loop 
i  :=  i  +  1 ; 
end  loop; 

If  ln_Str.Txt(i)  /= ' '  then 
a_start  :=  i; 

While  (ln_Str.Txt(i)  /= ' ')  and  (i  <  ln_Str.Len)  loop 
i  :=  i  +  1 ; 
end  loop; 

if  ln_Str.Txt(i)  = ' '  then 
a_end  :=  i  - 1 ; 
else 

a_end  :=  i; 
end  if; 

ajen  :=  a_end  -  a_start  + 1; 
t_len  :=  ln_Str.Len  -  a_end; 

TaiLStr.Len  :=  tjen; 

Atom_Str.Len  :=  a_len; 
for  i  in  1  ..aJen  loop 

Atom_Str.Txt(i)  :=  ln_Str.Txt{a_start  +  i  - 1); 
end  loop; 

fori  in  1..t_len  loop 
TaiLStr.Txt(i)  :=  ln_Str.Txt(a_end  +  i); 
end  loop; 

else 

Atom_Str.Len  :=  0; 

TaiLStr.Len  :=  0; 
end  if; 
else 

Atom_Str.Len  :=  0; 

TaiLStr.Len  :=  0; 
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end  if; 

end  Get_Atom: 

^^★**************************'************«*********<***********i^************* 

--*  GET_PHONEME  -  Returns  Phoneme  corresponding  to  an  atom  (atom  must  be  free 
of  leading  &  trailing  blanks  &  other  garbage  (accent  digits  on  end  OK) 

Function  Get_Phoneme  (ln_Atom  :  Str)  return  Phoneme  is 
i,Phn_End :  Integer; 

Phn_String :  String  (1..Max_Phn_Len); 

Out_Phoneme :  Phoneme; 

Begin 

Phn_End  :=  ln_Atom.Len; 
if  ln_Atom.Len  >  0  then 

While  ((ln_Atom.Txt(Phn_End)  in  ’0'..'9'))  and  (Phn_End  >  1)  loop 
Phn_End  :=  Phn_End - 1 ; 
end  loop; 
end  if; 

If  (Phn_End  >  0)  and  (Phn_End  <=  Max_Phn_Len)  then 
Phn_String(1  ..Phn_End)  :=  ln_Atom.Txt(1..Phn_End); 
begin  -  block  to  catch  constraint  error  if  not  a  valid  phoneme 
Out_Phoneme  :=  PhonemeValue(Phn_String(1..Phn_End)); 
return  Out_Phoneme; 
exception 

when  constraint_error  => 
return  XXX; 
end;  --  block 
else 

return  XXX; 
end  if; 

end  Get_Phoneme; 


-*  PUT_PHON  -  A  Put  routine  that  translates  from  my  enum 
"*  representations  to  standard  symbols 

^^*****WWW'*****W******W****W*W*W***4r*4r**************‘**4r4rW********4r****W****w 

Procedure  Put_Phon(0_File :  File_Type;  Phon :  Phoneme)  is 
begin 

If  Phon  =  HX  then 
Put(0_File,''Hr); 
elsif  Phon  =  XXX  then 
Put(0_File, "???"); 
else 

Put(0_File,Phoneme'image(Phon)); 
end  if; 

end  Put_Phon; 


GET_FILE  -  Prompts  user  for  filename,  opens  file,  &  returns  name  &  handle 

^^it******iHHr**it**it****itiHt**it*****itit***ilf**itit*1Ht**1r*******it*1r****ilr**1r******iHHt 

Procedure  Get_File(Prompt_String :  String; 

F_Name :  out  String; 
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F_Mode :  File_Mode; 

F_Handle  :  in  out  File_Type)  is 
Str_Last :  Integer; 

Str_Text :  String(1..Max_Str_Len)  :=  (others  =>  ' '); 
begin 


loop  "  loop  until  valid  filename  entered  or  user  aborts  ('O) 

Begin  -  block  statements  to  allow  another  try  if  bad  filename  entered 

Put(Prompt_String); 

Get_Line(Str_T  ext,Str_Last) ; 


if  F_Mode  =  ln_File  then 
Text_IO.Open(File  =>  F_Handle, 

Mode  =>  F_Mode, 

Name  =>  Str_Text(1..Str_Last)); 

else 

Text_IO.Create(File  =>  F_Handle, 

Mode  =>  F_Mode, 

Name  =>  Str_Text(1..Str_Last)); 

end  if; 

F_Name  :=  Str_Text; 


exception 

When  Name_Error  =>  Put(Ascii.Bel); 

Put_Line("***  Unable  to  open "  &  Str_Text(1..Str_Last)); 
New_Line; 

When  others  =>  raise; 

end; 

end  loop; 
end  Get_File; 


^^****it****ifk*****it**’k***icit********it*it**itifk****1fk***1Ht**it*itit*it*itii*i(1t**it1tirifk-k 

—*  INIT_MATRIX  -  Set  matrix  cells  to  one  if  that  diphone  is  listed  in 
diplist.txt. 

_ ********************************www*****w********w******w*****w*********** 

Procedure  lnit_Matrix  is 

lnput_Line :  String(1..Max_Str_Len); 
lnput_Leist :  natural; 

First_Phn_Str,Last_Phn_Str,Line_Str, Tail, Temp :  Str; 

Row, Col :  Phoneme; 

Begin 

Text_IO.Open(File  =>  DipList_lnFile, 

Mode  =>  ln_File, 

Name  =>  DipList_Filename): 

While  NOT  End_Of_File(DipList_lnFile)  loop 
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Get_Line(DipList_lnFile,lnput_Line,lnput_last): 

Line_Str.Txt(1.,lnput_Last)  :=  lnput_Line(1  ..lnput_Last): 
Line_Str.Len  :=  lnput_Last: 

Get_Atom(Line_Str,First_Phn_Str,Tail):  --  fetch  first  phoneme  str 
Get_Atom(Tail,Last_Phn_Str,Temp);  --  fetch  last  phoneme  str 


Row  :=  Get_Phoneme(First_Phn_Str): 

Col  :=  Get_Phoneme(Last_Phn_Str); 

Diphone_Matrix(row,col)  :=  1 ; 

end  loop; 

TextJO.CIose(DipList_lnFile); 
end  lnit_Matrix; 

-*  VOCAB  Main  Program 

^^*********************4r**ir*1lr  *********************************************** 


Begin  -  Vocab 

if  (Arg.Count  >  2)  and  then  (Arg.Data(3)  =  7d“  or  -  process  command  line 

Arg.Data(3)  =  “/D")  then 

Debugging  :=  True; 

elsif  (Arg.Count  >  2)  and  then  (Arg.Data(3)  =  7m"  or  --  process  command  line 

Arg.Data(3)  =  "/M")  then 

Mid_Splice_Enable  :=  True; 

elsif  Arg.Count  >  2  then  --  program  name  is  1st  entry  in  array 
raise  Usage_Error; 
end  if; 

New_Line(2);  --  blank  lines  &  program  "banner" 

Put_Line(Version) ; 

New_Line(2); 

if  (Arg.Count  >  1 )  then  -  if  input  filename  on  cmd  line,  open  it.  else  ask 
begin 

Text_IO.Open(File  =>  lnput_DataFile, 

Mode  =>  ln_File, 

Name  =>  Arg.Data(2)); 

lnput_Filename(1..Arg.Data(2)'length)  :=  Arg.Data(2); 
exception  -  handle  bad  filenames  from  command  line 

When  Name_Error  I 
Status_error  I 
Mode_error  I 
Use_error  I 

Device_error  =>  New_Line(3); 

Put_Line("***  Unable  to  open  "  &  Arg.Data(2)); 
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New_Line; 
raise  Usage_Error; 


end; 

else 

Get_FileCEnter  name  of  input  file: ", 
lnput_Filename, 
ln_File, 

lnput_Datafile); 

end  if; 


~  get  output  filename  from  user 
Get_File(’'Enter  name  of  output  file: ", 

Output_Filename, 

Out_File, 

Output_Datafile); 

If  Debugging  then 
New_Line; 

Put_Line(" - [  PROCESSING  FILE  ] - "); 

end  if; 

lnit_Matrix; 

While  NOT  End_Of_File(lnput_DataFile)  loop 
Get_Line(lnput_DataFile,lnput_Line,lnput_Last); 

--  if  not  a  comment  line 

if  lnput_Last  >  0  and  then  lnput_Line(1)  in  'A'..'z'  then 
if  debugging  then 

Put_Line(lnput_Line(1..lnput_Last)); 
end  if; 

for  i  in  1  ..lnput_Last  loop  -  remove  slashes,  don't  need  them, 
if  lnput_Line(i)  =  'P  then 
lnput_Line(i)  := ' '; 
end  if; 
end  loop; 

Temp.Txt(1..lnput_Last)  :=  lnput_Line(1  ..lnput_Last); 

Temp.Len  :=  lnput_Last; 

Get_Atom(Temp, Atom, Tail);  --  get  the  word 
Word  :=  Atom;  --  save  it 

if  Debugging  then 

Put_Line("Word  =  "  &  Word.Txt(1..Word.Len)); 
end  if; 

Last_Phoneme  :=  HX;  -  Consider  silence  &  first  phoneme  to  be  a  diphone 
ln_Vocab  :=  true; 

Splicing  :=  false;  --  This  flag  will  go  true  if  must  splice  around  a  dip 
Splice_Count  :=  0;  --  count  splices  for  each  word 

loop  --  for  each  phoneme  on  the  line  (in  the  word) 
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Temp  ;=Tail: 

Get_Atom(Temp, Atom, Tail); 
if  Atom.Len  >  0  then 

This_Phoneme  :=  Get_Phoneme(Atom); 

if  Debugging  then 
Put_Line("  Diphone  = "  & 

Phoneme'lmage(Last_Phoneme)  &  7"  & 
Phoneme'l  mage(This_Phoneme)) ; 

end  if; 


--  include  word  if  composed  of  diphones  in  diplist.txt 
if  (Diphone_Matrix(Last_Phoneme,This_Phoneme)  =  0)  and 
((Mid_Splice_Enable  =  true)  or  {Last_Phoneme  =  HX))  and 
(Splicing  =  false)  then  --  splice  around  if  just  one  missing 
Splicing  :=  true; 

Splice_Count  :=  Splice_Count  + 1; 
Splice_Array(Splice_Count,1)  :=  Last_Phoneme; 
Splice_Array(Splice_Count,2)  :=  This_Phoneme; 
elsif  Diphone_Matrix(Last_Phoneme,This_Phoneme)  =  0  then 
ln_Vocab  :=  false;  -  can't  splice  if  two  diphones  missing 
else 

Splicing  :=  false;  --  not  splicing  if  diphone  found 
end  if; 

Last_Phoneme  :=  This_Phoneme; 
end  if; 

exit  when  Tail.Len  <=  0; 
end  loop; 

if  Debugging  then 

Put_Line("  Diphone  =  "  & 

Phoneme’lmage(Last_Phoneme)  &  T  & 
Phoneme’lmage(HX)); 

end  if; 


--  Consider  last  phoneme  &  silence  to  be  a  diphone 
if  (Diphone_Matrix(Last_Phoneme,HX)  =  0)  and  (Splicing  =  false)  then 
Splice_Count  :=  Splice_Count  +  1; 

Splice_Array(Splice_Count,1)  :=  Last_Phoneme; 
Splice_Array(Splice_Count,2)  :=  HX; 
elsif  Diphone_Matrix(Last_Phoneme,HX)  =  0  then 
ln_Vocab  :=  false; 
end  if; 


-  output  word  if  it  was  composed  of  diphones  in  diplist.txt 
if  ln_Vocab  then 

Text_IO.Put(Output_Datafile,Word.Txt(1..Word.Len)); 
Text_IO.Put(Output_Datafile,',‘); 
Text_IO.Put(Output_Datafile,integer'image(splice_count)); 
if  (Splice_Count  >  0)  then 
for  i  in  1  ..splice_count  loop 
T extJO.  Put(Output_Dataf  ile,',') ; 
Put_Phon(Output_Datafile,Splice_Array(i,1)); 

T extJO .  Put(Output_Dataf  ile,'/") ; 
Put_Phon(Output_Datafile,Splice_Array(i,2)); 
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end  loop: 
end  if; 

Text_IO.New_Line{Output_Datafile): 
end  if; 

end  if; 

end  loop; 

If  Debugging  then 

Put_Line(" - [  DONE  ] - 

New_Line; 
end  if; 

Text_IO.CIose(lnput_Datafile): 

Text_IO.CIose(Output_Datafile): 

exception 

When  Usage_Error  => 

Put(Ascii.Bel); 

New_Line(3); 

Put_Line{"Usage:  Vocab  infilename  [-d]"); 
New_Line: 

Put_Line("  /d  =  enable  debugging"); 

Put_Line("  /m  =  enable  mid-word  splices"); 

New_Line; 
return; 

end  Vocab; 
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H.  MAKWLIST.ADA 


-•*  PROGRAM  NAME:  MakWList  (MakWList.ada) 

»•** 

-**  AUTHOR:  Maj  Mark  Cantrell,  USMC  (cantrell@nps.navy.mil) 

_ *★ 

--**  DESCRIPTION:  Program  for  reading  a  list  of  TIMIT  *.WRD 
”**  files  &  generating  a  new  list  with  occurences  of  each  word 
--**  contained  in  a  vacabulary  file 

--**  DATE  MODIFIED:  21  Jan,  1996  (see  below  for  version  number) 

-»**********************************♦*********************************★* 

with  Arg, 

Cody_Math, 

Text_IO; 

useTextJO; 

procedure  MakWList  is 

package  Long_lnt_IO  is  new  Text_IO.Integer_IO(Long_lnteger);  use  LongJnt_IO: 


Max_Str_Len :  constant  :=  255; 

Max_Word_Len  :  constant  :=  20; 

Max_Num_Words  :  constant  :=  1000; 

Vocab_Filename  :  constant  string  :=  "VOCAB.TXT"; 

Usage_Error :  exception; 

SubType  Word_Str  is  String(1..Max_Word_Len); 

Vocab_Array  :  array(1..Max_Num_Words)  of  Word_Str  := 

(others  =>  (others  =>  ' ')); 

Debugging  :  boolean  :=  false; 

lnput_Filename,Output_FileName  :  string(1..Max_Str_Len)  :=  (others  =>  ' '); 
Name_Last :  Natural; 

lnput_Line :  string(1..Max_Str_Len); 
lnput_Last :  natural; 

Version  :  constant  String  :=  “MakWList  Version  #1.0"; 
lnput_DataFile,Output_DataFile :  Text_IO.File_Type; 

Num_Words  :  natural  :=  0; 
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_^***********************w*******************************************^****** 

“*  UPPER_CASE  -  Returns  upper  case  version  of  input  string 
_**********★♦*****★★******★★*********★***★*★******************«*★★***♦***** 

Function  Upper_Case(ln_String :  string)  return  string  is 
Result_String :  String(ln_Stringfirst..ln_String'last); 

Begin 

for  index  in  ln_Stringfirst..ln_String'last  loop 
If  ln_String(index)  in  'a'..'z'  then 

Result_String(index)  :=  character'val(character‘pos(ln_String(index)) 

-  character'pos('a') 

+  character'pos('A')): 

else 

Result_String(lndex)  :=  ln_String(index): 
end  if; 

end  loop; 

return  Result_String; 
end  Upper_Case; 

GET_FILE  -  Prompts  user  for  filename,  opens  file,  &  returns  name  &  handle 
************************************************************************** 


Procedure  Get_File(Prompt_String :  String; 

F_Name :  out  String; 

F_Mode :  File_Mode; 

F_Handle  :  in  out  File_Type)  is 
Str_Last :  Integer; 

Str_Text :  String(1  ..Max_Str_Len)  :=  (others  =>  ' '); 
begin 


loop  "  loop  until  valid  filename  entered  or  user  aborts  (^) 

Begin  --  block  statements  to  allow  another  try  if  bad  filename  entered 

Put(Prompt_String); 

Get_Line(Str_Text,Str_Last); 


if  F_Mode  =  ln_File  then 
Text_IO.Open(File  =>  F_Handle, 

Mode  =>  F_Mode, 

Name  =>  Str_Text(1..Str_Last)); 

else 

Text_IO.Create(File  =>  F_Handle, 

Mode  =>  F_Mode, 

Name  =>  Str_Text(1..Str_Last)); 

end  if; 

F_Name  :=  Str_Text; 
exit; 
exception 

When  Name_Error  =>  Put(Ascii.Bel); 

Put_Line("***  Unable  to  open "  &  Str_Text(1..Str_Last)); 
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New_Line; 

When  others  =>  raise; 

end; 

end  loop; 
end  Get_File; 

_ ★★*★***★***★*★***********★******★**★★*★★*★****★*★********★***★*★★****♦ 

-*  IN_VOCAB  -  Determine  if  a  word  is  in  the  vocabulary 


Function  ln_Vocab(Key :  String)  return  boolean  is 

mid :  natural; 
s :  natural  :=  1 ; 
e :  natural  :=  Num_Words; 
found  :  boolean  :=  false; 

begin 

If  debugging  then 
Text_IO.Put("Checking  for:  <"); 

T ext_IO.  Put(Key); 

Text_IO.Put_Line(">“); 
end  if; 

while  (found  =  false)  and  (S  <=  E)  loop 
Mid  :=  integer(Cody_Math.round(float(S+E)/2.0)); 

If  debugging  then 

Text_IO.Put("Comparing  to:  <"); 
Text_IO.Put^ocab_Array(Mid)); 

T ext_IO.Put_Line(">") ; 
end  if; 

If  Vocab_Array(mid)  =  Key  then 
found  :=  true; 

elsif  Key  <=  Vocab_Array(mid)  then 
e:=  Mid-1; 

else 

s  :=  Mid  +  1 ; 
end  if; 
end  loop; 

If  found  then 
return  true; 
else 

return  false; 
end  if; 

end  ln_Vocab; 


_^********WW*********W*W************W**********WW***********^*******^j^^*^*** 

"*  PROCESS_LINE  -  Process  line  from  wrd  file  list  &  output  if  that  word 
is  in  vocabulary 

_^****************************************************w**ww*******w**ww***** 
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Procedure  Process_Line(File_Line,Wav_Filename :  string)  is 


Word_Read  :  String(1  ..Max_Word_Len)  :=  (others  => ' '); 
Starl_Sample,End_Sample :  Long_lnteger; 
lndex_1,lndex_2,Word_lndex :  natural; 

begin 

Long_lntJo.Qet(File_Line,Start_Sample,lndex_1); 

Long_lnt_lo.Get(File_Line(lndex_1  +1  ..File_Line'length),End_Sample,  lndex_2): 
Word_lndex  :=  0; 

For  i  in  lndex_2+1..File_Line'length  loop 
if  File_Line(i)  /= ' '  then 

Wordjndex  :=  Wordjndex  +  1; 

Word_Read(Word_lndex)  :=  File_Line(i); 
end  if; 
end  loop; 

If  debugging  then 
TextJO.Put("File_Line  is:  <"); 

Text_IO.Put(File_Line): 

Text_IO.Put_Line(">"); 
end  if; 

if  ln_Vocab(Upper_Case(Word_Read))  then 
Text_lo.Put(Output_Datafile,Wav_Filename); 

T ext_lo.Put(Output_Datafile,' '); 

Text_lo.Put(Output_Datafile,Word_Read(1..Word_lndex)); 
Text_lo.Put(Output_Datafile,' '); 
Long_lntJo.Put(Output_Datafile,Start_Sample,0); 
TextJo.Put(Output_Datafile,' '); 
Long_lnt_lo.Put(Output_Datafile,End_Sample,0); 
Text_IO.New_Line(Output_Datafile); 
end  if; 

end  Process_Line; 


PROCESS_WRD_FILE  -  Process  lines  from  a  word  file:  pass  each  line 
-*  of  each  wrd  file  to  processjine  for  vocab  check  &  possible  output 

Procedure  Process_Wrd_File{Wrd_Filename  :  String)  is 

Dot_Loc :  natural  :=  0; 

Wrdjnfile :  Text_IO.File_Type; 

Word_Line :  String(1  ..Max_Str_Len); 

Line_Last :  natural; 

WAV_Filename  :  String(1,..Wrd_Filename'length)  :=  Wrd_Filename; 
begin 

Text_IO.Open(File  =>  WrdJnFile, 

Mode  =>  ln_File, 

Name  =>  Wrd_Filename); 

loop  -  determine  WAV  filename 
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dotjoc  :=  dotjoc  + 1 ; 

exit  when  (Dot_Loc  >  Wrd_Filename'length)  or  else  {Wrd_Filename(Dot_Loc)  =  ; 

end  loop; 

WAV_Filename(Dot_Loc+1..Dot_Loc+3)  :=  “WAV"; 

--  process  all  lines  in  WRD  file 
while  NOT  End_of_Flle(Wrd_lnFile)  loop 
Text_lo.Get_Line(WrdJnFlle,Word_Line,Line_Last): 
Process_Line(Word_Line(1..Line_Last),WAV_Filenanne): 
end  loop: 

Text_IO.CIose(Wrd_lnflle); 
end  Process_Wrd_File; 

-*  PROCESS_FILELIST  -  Process  lines  from  a  file  list  file  &  pass  each  line 
of  each  wrd  file  to  process_wrd_file 

Procedure  Process_Filelist(FileListJnfile :  TextJO.File_Type)  is 
FileName_Line :  String(1..Max_Str_Len): 

Filename_Last :  Natural; 
begin 


-  process  all  lines  in  file  list  file 
while  NOT  End_of_File(FileList_lnFile)  loop 
TextJo.Get_Line(FileList_lnFile,Filename_Line,Filename_Last); 

If  debugging  then 

TextJO.Put("File  Name  is;  <"); 
Text_IO.Put(Filename_Line(1..Filename_Last)): 
Text_IO.Put_Line(“>"); 
end  if; 


Process_WRD_File(Filename_Line(1..Filename_Last)); 
end  loop; 

end  Process_FileList; 


__****♦************************************************★********«■***«******* 

Read_Vocab  -  Read  words  from  vocab  file  into  vocab  array 
_ ************************************★★*****★*★**★★★*»*★**★★»*★★»★★★»♦**»*★ 


Procedure  Read_Vocab  is 

Vocabjnfile :  Text_IO.File_Type; 
Word_Line :  String(1..Max_Str_Len): 
Word_Last,Word_End :  Natural; 
Word_Read :  String(1..Max_Word_Len); 

begin 

Text_IO.Open(File  =>  VocabJnFile, 
Mode  =>  ln_File, 

Name  =>  Vocab_Filename); 
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"  process  all  lines  in  WRD  file 
while  NOT  End_of_File(Vocab_lnFile)  loop 
Text_lo.Get_Line(Vocab_lnFlle,Worcl_Line,Word_Last): 
Word_Read  :=  (others  => '  ’); 

Word_End  :=  0; 

loop  --  determine  word  read 

word_end  :=  word_end  + 1 ; 
exit  when  (Word_End  >  Max_Word_Len)  or  else 
(Word_End  >  Word_Last)  or  else 
(Word_Line(Word_End)  = 
Word_Read(Word_End)  :=  Word_Line(Word_End); 
end  loop; 

Num_Words  :=  Num_Words  +  1 ; 
Vocab_Array(Num_Words)  :=  Upper_Case(Word_Read); 
end  loop; 

TextJO.CIose(Vocab_lnFile); 

If  debugging  then 
TextJO.Put_Line(“Vocab  is:"); 

For  i  in  1..Num_Words  loop 

Text_IO.Put_Line(Vocab_Array(i)); 
end  loop; 
end  if; 

end  Read_Vocab; 


-*  MakWList  Main  Program 


Begin  --  MakWList 

if  (Arg.Count  >  2)  and  then  {Arg.Data(3)  =  "/d"  or  -  process  command  line 

Arg.Data(3)  =  "/D"  or 
Arg.Data(3)  =  "-d"  or 
Arg.Data(3)  =  "-D"  or 
Arg.Data(3)  =  "d"  or 
Arg.Data(3)  =  "D")  then 

Debugging  :=  True; 

elsif  Arg.Count  >  2  then  --  note  that  program  name  is  1st  entry  in  array 
raise  Usage_Error; 
end  if; 

New_Line(2);  --  blank  lines  &  program  "banner" 

Put_Line(  Version); 

New_Line(2); 

if  (Arg.Count  >  1 )  then  --  if  input  filename  on  cmd  line,  open  it.  else  ask 
begin 

Text_IO.Open(File  =>  lnput_DataFile, 

Mode  =>  ln_File, 

Name  =>  Arg.Data(2)); 

lnput_Filename(1..Arg.Data(2)1ength)  :=  Arg.Data(2); 
exception  --  handle  bad  filenames  from  command  line 
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When  Name_Error  I 
Status_error  I 
Mode_error  I 
Use_eiTor  I 

Device_error  =>  New_Line(3): 

Put_Line("***  Unable  to  open "  &  Arg.Data(2)): 

New_Line: 

raise  Usage_Error; 

end; 

else 

Get_File("Enter  name  of  input  file: 
lnput_Filename, 
ln_File, 

lnput_Datafile): 

end  if; 


--  get  output  filename  from  user 
Get_File("Enter  name  of  output  file: ", 

Output_Filename, 

Out_File, 

Output_Datafile); 

If  Debugging  then 
New_Line; 

Put_Line(" - [  READING  VOCAB.TXT  ] - "); 

end  if; 

Read_Vocab; 

If  Debugging  then 

Put_Line(“ - 

New_Line(2); 

Put_Line(" - 

end  if; 

Process_Filelist(lnput_Datafile); 

If  Debugging  then 

Put_Line(" - [  DONE  ] - “); 

New_Line; 
end  if; 

Text_IO.CIose(lnput_Datafile); 

Text_IO.CIose(Output_Datafile); 

exception 

When  Usage_Error  => 

Put(Ascii.Bel); 

New_Line(3); 

Put_Line("Usage:  MakWList  infilename  [-d]"); 

New_Line; 

Put_Line("  -d  =  enable  debugging"); 

New_Line; 
return; 

end  MakWList; 


[  DONE  READING  VOCAB.TXT  ] - "); 

-[  PROCESSING  FILE  LIST] - "); 
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L  PHNDURS.ADA 


__***★***★******★★*★★★★★★**★★*★★★★★****★★★**★*♦***★**★*★*★***★**★***♦*** 

^^^★★★★★★★★★★★★★★★★****************************************************** 

--**  PROGRAM  NAME:  PhnDurs  (PhnDurs.ada) 

_ ** 

--**  AUTHOR:  Maj  Mark  Cantrell,  USMC  (cantrell@nps.navy.mil) 

--**  DESCRIPTION:  Program  for  reading  a  list  of  phonetic  transcription 
“**  files  &  recording  max  diphone  durations  listed 

--**  DATE  MODIFIED:  17  Sep,  1995  (see  below  for  version  number) 
_** 


with  Arg, 

TextJO; 

use  Text_IO: 

procedure  PhnDurs  is 

Max_Str_Len  :  constant  :=  255; 

Max_Phn_Len  :  constant  :=  5; 

CoLWidths  :  constant  :=  6; 

Fs  :  constant  :=  16000.0; 

FList_Filename  :  constant  string  :=  "DIPFLIST.TXT"; 

Usage_Error :  exception; 

Type  STR  is  record 
Txt :  String(1..Max_Str_Len); 

Len  :  Integer  range  O..Max_Str_Len  :=  0; 
end  record; 


Type  Phn_Rec  is  record 
Min :  integer  :=  integer’Iast; 

Max :  integer  :=  0; 

Sum  :  float  :=  0.0; 

Count :  Integer  :=  0; 
end  record; 

Temp,Tail,Atom,Atom_Str,TaiLStr,Phon_Str :  STR; 


Type  Phoneme  is  (AA,AE,AH,AO,AW,AY,B,CH,D,DH,EH,ER,EY,  --  1st  3  lines  from  CMU  lex 
F,G,HH,IH,IY,JH,K,L,M,N,NG,OW,OY,P,R, 

S,SH,T,TH,UH,UW,V,W,Y,Z,ZH, 

EM,EN,ENG,EL,AX,IX,AXR,HX,  --this  line  from  TIM  IT 
XXX);  ~  use  XXX  for  unknown  codes 

Phoneme_Array :  array(Phoneme)  of  Phn_Rec; 
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-  Max_Dip_X,Max_Dip_Y :  Phoneme  :=  HX; 

Max_Duration :  Integer  :=  0; 

Max_Fname :  Str; 

Debugging :  boolean  :=  false; 

lnput_Filename,Output_FileName :  string(1  ..Max_Str_Len)  ;=  (others  => ' '); 
Name.Last :  Natural; 

lnput_Line :  string(1..Max_Str_Len); 
lnput_Last :  natural; 

Version :  constant  String  :=  "PhnDurs  Version  #1.0"; 
lnput_DataFile,Output_DataFile,FList_File :  Text_IO.File_Type; 


UPPER_CASE  -  Returns  upper  case  version  of  input  string 
—*********************************************************************1#**** 

Function  Upper_Case(ln_String :  string)  return  string  is 
Result_String :  String(ln_String'first..ln_String'last); 

Begin 

for  index  in  ln_Stringfirst..ln_String'last  loop 
If  ln_String(index)  in  'a'..'z'  then 

Result_String(index)  :=  characterVal(character'pos(ln_String(index)) 
-  character’posCa') 

+  character’posCA')); 

else 

Result_String(index)  :=  ln_String(index); 
end  if; 

end  loop; 

return  Result_String: 
end  Upper_Case; 

GET_ATOM  -  Gets  next  atom  from  a  str 

^^******-k*'k***-k'k**it****ic*itit******ifkit1fk*ifk1t*1tititiiir'k*ie**it**it*iifk*it**ifkii’kit*1t****it 


Procedure  Get_Atom(ln_Str :  Str;Atom_Str,Tail_Str :  out  Str)  is 
i,a_start,a_end,a_len,tjen  :  integer  :=  1 ; 

Begin 

If  ln_Str.Len  >  0  then 

While  (ln_Str.Txt(i)  = ' ')  and  (i  <  ln_Str.Len)  loop 
i  1=  i  +  1 ; 
end  loop; 

If  ln_Str.Txt(i)  /= '  ‘  then 
a_start  :=  i; 

While  (ln_Str.Txt(i)  /=  ' ')  and  (i  <  ln_Str.Len)  loop 
i  :=  i  + 1; 
end  loop; 

if  ln_Str.Txt(i)  = ' '  then 
a_end  :=  i  - 1 ; 
else 

a_end  :=  i; 


223 


end  if; 

ajen  :=  a_end  -  a_start  + 1; 
tjen  :=  ln_Str.Len  -  a_end; 

TaiLStr.Len  :=  tJen; 

Atom_Str.Len  :=  aJen; 
for  i  in  1  ..ajen  loop 

Atom_Str.Txt(i)  :=  ln_Str.Txt(a_start  +  i  - 1); 
end  loop; 

for  i  in  1..tjen  loop 

TaiLStr.Txt(i)  :=  ln_Str.Txt(a_end  +  i); 
end  loop; 

else 

Atom_Str.Len  :=  0; 

TaiLStr.Len  :=  0; 
end  if; 
else 

Atom_Str.Len  :=  0; 

TaiLStr.Len  :=  0; 
end  if; 

end  GeLAtom; 

^^****************«*******************************4-*******************A-***** 

-*  GET_PHONEME  -  Returns  Phoneme  corresponding  to  an  atom  (atom  must  be  free 
of  leading  &  trailing  blanks  &  other  garbage  (accent  digits  on  end  OK) 


Function  Get_Phoneme  (ln_Atom :  Str)  return  Phoneme  is 
i,Phn_End :  Integer; 

Phn_String :  String  (1..Max_Phn_Len); 

OuLPhoneme :  Phoneme; 

Begin 

Phn_End  :=  ln_Atom.Len; 
if  ln_Atom.Len  >  0  then 

While  ((ln_Atom.Txt(Phn_End)  in  '0'..'9'))  and  (Phn_End  >  1)  loop 
Phn_End  :=  Phn_End-1; 
end  loop; 
end  if; 

If  (Phn_End  >  0)  and  (Phn_End  <=  Max_Phn_Len)  then 
Phn_String(1  ..Phn_End)  :=  Upper_Case(ln_Atom.Txt(1..Phn_End)); 

begin  ~  block  to  catch  constraint  error  if  not  a  valid  phoneme 
OuLPhoneme  :=  Phoneme'Value(Phn_String(1..Phn_End)); 
return  OuLPhoneme; 
exception 

when  constraint_error  => 

if  Phn_End  =  2  and  then  Phn_String(1..2)  =  "H#"  then 
return  HX; 
else 

return  XXX; 
end  if; 

end;  -  block 
else 

return  XXX; 
end  if; 

end  GeLPhoneme; 
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-■*  ECHO_DATA  -  Used  by  Process_Phn_File  to  echo  the  data  read  from  file 
*************************************************************************** 


Procedure  Echo_Data  (Start_N,  End_N  :  Longjnteger; 

Phon_Str :  Str; 

Phoneme_Read :  Phoneme)  is 


begin 

if  Debugging  then 

Put_Line("  Start  = "  &  longJnteger'image(Start_N)); 

Put_Line("  End  = "  &  long_integer'image(End_N)); 

Put_Line("  Phoneme  &  Phon_Str.Txt(1  ..Phon_Str.Len)  & 

"  &  Phoneme'image(Phoneme_Read)); 

end  if; 

end  Echo_Data: 

__******★***♦***★*★*★**♦****★*♦*★*************♦**♦*******♦*****★*******★**** 

“*  PUT_PHON  -  A  Put  routine  that  translates  from  my  enum 
representations  to  standard  symbols.  Used  by  Display_Array. 


Procedure  Put_Phon(0_File  :  File_Type:  Phon  :  Phoneme)  is 
begin 

If  Phon  =  HX  then 
Put(0_File,"H#"): 
elsif  Phon  =  XXX  then 
Put(0_File, “???"): 
else 

Put(0_File,Phoneme'image(Phon)): 
end  if; 

end  Put_Phon; 

-*  Display_Array  -  Prints  final  results  to  output  file 


Procedure  Display_Array  is 
row :  Phoneme: 
avg ;  float; 
count :  integer: 
begin 

Put  Lin@(Output  Dstsfil©,*'****************************************************************************”^* 
Put_Line(Output_Datafile,"  MAX  PHONEME  DURATIONS  FOR  FILES  LISTED  IN  "  & 

lnput_Filename); 

Put  Lin6(0utput  Qatsfil©,"*********************************************************’*^******************"^* 

newJine(Output_Datafile); 

for  row  in  Phoneme_Array'range  loop 

count  :=  Phoneme_Array(row).count; 

If  count  >  0  then 
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avg  :=  Phoneme_Array(row).sum/float(Phoneme_Array(row).count): 
Put(Output_Datafile,“("); 

Put_Phon(Output_Datafile,Row);  ~  phn  name 
Put(Output_Datafile,"  ("); 

Put(Output_Datafile,integer'image(Phoneme_Array(row).Min)); 
Put(Output_Datafile," "); 

Put(Output_Datafile,integer'image(Phoneme_Array(row).Max)); 
Put(Output_Datafile,"  “); 

Put(Output_Datafile,integer'image(integer(avg))): 

Put(Output_Datafile,")"); 

Put(Output_Datafile,")"); 

newJine(Output_Datafile): 


else 


Put(Output_Dataf  lie, "(“) ; 

Put_Phon(Output_Datafile,Row); 

Put(Output_Datafile,"  (nil  nil  nil))"); 

New_Line(Output_Datafile): 

end  if; 
end  loop; 

end  Display_Array; 

GET_FILE  -  Prompts  user  for  filename,  opens  file,  &  returns  name  &  handle 


Procedure  Get_File(Prompt_String :  String; 

F_Name :  out  String; 

F_Mode :  File_Mode; 

F_Handie  :  in  out  File_Type)  is 
Str_Last :  Integer; 

Str_Text :  String(1..Max_Str_Len)  :=  (others  => ' '); 
begin 


loop  -  loop  until  valid  filename  entered  or  user  aborts  (^C) 

Begin  --  block  statements  to  allow  another  try  if  bad  filename  entered 

Put(Prompt_String); 

Get_Line(Str_Text,Str_Last); 


if  F_Mode  =  ln_File  then 
Text_IO.Open(File  =>  F_Handle, 

Mode  =>  F_Mode, 

Name  =>  Str_Text(1..Str_Last)); 

else 

TextJO.Create(File  =>  F_Handle, 
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end  if; 


Mode  =>  F_Mode, 

Name  =>  Str_Text(1..Str_Last)); 


F_Name  :=  Str_Text; 
exit; 


exception 

When  Name_Error  =>  Put{Ascii.Bel); 

Put_Line("***  Unable  to  open "  &  Str_Text(1..Str_Last)); 
New_Line; 

When  others  =>  raise; 

end; 

end  loop; 
end  Get_File; 


— *  RECORD_DURATION  -  Used  by  Process_Phn_File  to  record  a  phoneme's  duration 
"*  in  Phoneme_Array. 

^^ii****1ife****-kit1fkititit1Htit*ifk1t****ifkir****itifk1tiHt**itit*ititit*it***it*it*1t**it******ir**** 


Procedure  Record_Duration  (Start_N,End_N  :  Longjnteger; 

Phn :  Phoneme)  is 

Phn_Duration :  Integer; 
m_Sec :  Integer; 

LocaLFName :  Str; 

begin 

if  End_N  -  Start_N  <  longjnteger(integer'last)  then 
Phn_Duration  :=  integer(End_N  -  Start_N); 
else 

Phn_Duration  :=  integer'Iast; 
end  if; 

if  Phn_Duration  >  Phoneme_Array(Phn).Max  then 
Phoneme_Array{Phn).Max  :=  Phn_Duration; 
end  if; 

if  Phn_Duration  <  Phoneme_Array(Phn).Min  then 
Phoneme_Array(Phn).Min  :=  Phn_Duration; 
end  if; 

Phoneme_Array(Phn).Sum  ;=  Phoneme_Array(Phn).Sum  +  float(Phn_Duration); 
Phoneme_Array(Phn).Count  :=  Phoneme_Array(Phn).Count  +  1; 

end  Record_Duration; 


PROCESS_PHN_FILE  -  Opens  designated  file  &  records  diphone  durations 

^^**iif**************^t****************************************************^*** 

Procedure  Process_Phn_File  (Phn_Filename :  string)  is 
Phn_File :  Text_IO.File_Type; 
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Temp, Atom, Tail, This_PhnAtom,Phn_FName  :  Str; 
This_Start,This_End :  longjnteger; 

Phn_Line :  string(1  ..Max_Str_Len); 

Phn_Last :  natural; 

This_Phoneme :  Phoneme; 


begin 


-  set  up  filename  str  for  Record_Duration  calls 
Phn_FName.Txt(1..Phn_Filename'length)  :=  Phn_Filename; 
Phn_FName.Len  ;=  Phn_Filename'length; 


TextJO.Open{File  =>  Phn_File, 

Mode  =>  ln_File, 

Name  =>  Phn_Filename); 


getJine(Phn_File,Phn_Line,Phn_Last); 


if  Phn_Last  >  5  then 


While  NOT  End_Of_File(Phn_File)  loop 

if  debugging  then 
Put_Line(Phn_Line(1  ..Phn_Last)); 
end  if; 

Temp.Txt(1..Phn_Last)  :=  Upper_Case(Phn_Line(1..Phn_Last)); 
Temp.Len  :=  Phn_Last; 

Get_Atom(Temp,Atom,Tail);  --  get  the  start  N 
This_Start  :=  Long_lnteger'value(Atom.Txt(1  ..Atom.Len)); 

Temp  :=Tail; 

Get_Atom{Temp,Atom,Tail);  --  get  the  ending  N 
This_End  :=  Long_lnteger'value(Atom.Txt(1.. Atom.Len)); 

Temp  :=Tail; 

Get_Atom(Temp,Atom,Tail);  --  get  the  phoneme  string 
This_Phoneme  :=  Get_Phoneme(Atom); 

This_PhnAtom  :=  Atom; 

Echo_Data  (This_Start,This_End,Atom,This_Phoneme); 
getJine(Phn_File,Phn_Line,Phn_Last); 
Record_Duration(This_Start,This_End,This_Phoneme); 
end  loop; 
end  if; 

Close(Phn_File); 

exception 

When  Name_Error  I  Use_Error  I  Device_Error  => 

New_Line; 
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Put_Line("***  Unable  to  open  PHN  file "  &  Phn_Filename): 
New_Line; 

end  Process_Phn_File: 


^^************************************************************************** 
PhnDurs  Main  Program 

*★*★**★★★*★★***★*★★***★*★**★★*★♦*****★***★♦***********★*★**★★*♦*♦*★****♦★* 


Begin  --  PhnDurs 

if  (Arg.Count  >  2)  and  then  (Arg.Data(3)  =  7d“  or  -  process  command  line 

Arg.Data(3)  =  "/D"  or 
Arg.Data(3)  =  "-d"  or 
Arg.Data(3)  =  *-D"  or 
Arg.Data(3)  =  "d"  or 
Arg.Data{3)  =  "D”)  then 

Debugging  :=  True; 

elsif  Arg.Count  >  2  then  --  note  that  program  name  is  1st  entry  in  array 
raise  Usage_Error; 
end  if; 

New_Line(25):  -  blank  lines  &  program  “banner" 

Put_Line(Version); 

New_Line(2): 

if  (Arg.Count  >  1)  then  -  if  input  filename  on  cmd  line,  open  it.  else  ask 
begin 

TextJO.Open(File  =>  lnput_DataFile, 

Mode  =>  ln_File, 

Name  =>  Arg.Data(2)); 

lnput_Filename(1..Arg.Data(2)'length)  :=  Arg.Data(2); 
exception  --  handle  bad  filenames  from  command  line 

When  Name_Error  I 
Status_error  I 
Mode_error  I 
Use_error  I 

Device_error  =>  New_Line(3); 

Put_Line(“***  Unable  to  open  "  &  Arg.Data(2)): 

New_Line: 

raise  Usage_Error; 

end; 

else 

Get_File("Enter  name  of  input  file: 
lnput_Filename, 
ln_File, 

lnput_Datafile); 

end  if; 


-  get  output  filename  from  user 
Get_File("Enter  name  of  output  file:  ”, 

Output_Filename, 

Out_File, 

Output_Datafile); 
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If  Debugging  then 
New_Line; 

Put_Line(" - [  PROCESSING  FILE  ] - "); 

end  if; 

While  NOT  End_Of_File(lnput_DataFile)  loop 
Get_Line(lnput_DataFile,lnput_Line,lnput_Last); 
lnput_Line  :=  Upper_Case(lnput_Line); 
if  lnput_Last  >  0  then 
If  Debugging  then 

Put_Line("#### "  &  lnput_Line(1..lnput_Last)  & "  ####"); 
end  if; 

Process_Phn_File{lnput_Line(1..lnput_Last)); 
if  Debugging  then 

Put_Line("###########################################“); 
end  if; 

end  if; 

end  loop; 

Display_Array; 

If  Debugging  then 

Put_Line(" - [  DONE  ] - "); 

New_Line; 
end  if; 

Text_IO.CIose(lnput_Datafile); 

Te)ct_IO.CIose(Output_Datafile); 

exception 

When  Usage_Error  => 

Put(Ascii.Bel); 

New_Line(3); 

Put_Line("Usage:  PhnDurs  infilename  [-d]"); 

New_Line; 

Put_Line("  -d  =  enable  debugging"); 

New_Line; 

return; 

end  PhnDurs; 
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APPENDIX  C.  LISP  CODE 


A.  CONTENTS 

1.  RECOGNIZ.CL 

This  is  the  main  lexical  analyzer  code.  It  reads  a  list  of  diphone  detections  (produced  by 
RECOGNIZ.M)  and  outputs  a  list  of  word  candidates.  Note  that  this  code  is  not  limited  to  the 
diphone  list  of  Appendix  D  or  the  vocabulary  of  Appendix  E.  The  entire  TIMIT  lexicon  is 
scanned  by  the  lexical  analyzer  (and  a  larger  lexicon  may  easily  be  substituted).  If  some  of  a 
word's  diphones  are  not  among  the  neural  network's  output  classes,  the  lexical  analyzer  will 
simply  fail  to  ever  find  those  diphones  in  its  input  stream.  Appendix  E  contains  additional 
comments  on  this  issue. 

2.  PEAKS.CL 

This  file  contains  functions  for  removing  all  but  the  output  peaks  from  the  input  file  used 
by  RECOGNIZ.CL. 

3.  DICT.CL 

This  fde  contains  phonetic  dictionary  utilities. 

4.  LEXICON.TXT 

This  is  the  TIMIT  lexicon  converted  to  a  LISP  list.  The  complete  file  is  too  large  to 
include  in  this  appendix.  So  an  extract  is  provided  as  a  sample  of  the  format  used. 

5.  LEXICON2.TXT 

This  is  the  TIMIT  lexicon  converted  to  a  LISP  list.  This  version  is  sorted  by  the  second 
phoneme  in  each  word.  The  complete  file  is  too  large  to  include  in  this  appendix.  So  an  extract 
is  provided  as  a  sample  of  the  format  used. 

6.  PHNINDEX  TXT 
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This  is  an  index  containing  each  phoneme  (45  entries)  and  the  file  position  of  the  first 
word  in  LEXICON.TXT  that  starts  with  that  phoneme. 

7.  PHNINDX2  TXT 

This  is  an  index  containing  each  phoneme  (45  entries)  and  the  file  position  of  the  first 
word  in  LEXICON2.TXT  that  has  that  phoneme  as  its  second  phoneme. 

8.  PHNDURS  TXT 

This  is  a  list  of  phonemes  and  their  minimum,  average,  and  maximum  expected  durations. 
It  was  produced  by  PHNDURS.ADA  and  is  used  by  RECOGNIZ.CL  to  bound  its  search  for 
phonemes  in  a  word. 

B.  RECOGNIZ.CL 


************************************************************************** 


RECOGNIZ.CL  -  This  program  converts  a  dIphone  peak  list  into  word 
candidates. 


Written  by  Maj  Mark  E.  Cantrell,  USMC 
Version  1.3;  7  June  1996 

★************************************************************************* 


GLOBALS 

************************************************************************** 


(defvar  *diphone_peak_list*  nil)  ;  peaks  list  produced  by  dip2peak 
(defvar  ‘phonemejndex*  nil)  ;  lexicon  index  by  1st  phoneme 
(defvar  *phonemeJndex2*  nil)  ;  lexicon  index  by  2nd  phoneme 
(defvar  *phn_durs*  nil)  ;  list  of  max  &  min  phn  durations 

(defvar  *dict_file*  nil)  ;  lexicon  sorted  by  1st  phn 

(defvar  *dict_file2*  nil)  ;  lexicon  sorted  by  2nd  phn 

(defvar  *word_candidates*  nil)  ;  words  from  peaks_to_words 
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(defvar  *last_phns*  nil)  ;  phns  located  in  last  build_word  call 

(defvar  *last_peaks*  nil)  ;  peaks  located  in  last  build_word  call 

(defvar  *failed_phn*  nil)  ;  phn  that  last  build_word  call  failed  on 

(defconstant  *dummy_confidence*  0.0)  ;  confidence  assigned  to  dummy  diphones 
(defconstant  *min_conf*  0.0005)  ;  word  confidence  must  be  >  this  to  list 

(defconstant  *min_peak*  0.0005)  ;  start  peaks  with  lower  confidence  are  ignored 
(defconstant  *win_rate*  170)  ;  this  number  of  samples  between  (interleaved) 

;  window  positions 

;  LOAD  SUPPORT  CODE 


(load  "peaks")  ;  functions  for  converting  NNET  outputs  to  diphone  peaks 
(load  "diet")  ;  phonetic  dictionary  functions 


KEY  FUNCTIONS  FOR  TURNING  DIPHONE  PEAKS  INTO  WORD  CANDIDATES 


ieiciticiriticiticitieitic*ifTk’kit’kit*it*it*i(*itie*icititicit*iticitit*ic*itic*'kititit1tit-kieif'k'k'k*it***ie*icit*ic** 


PEAKS_TO  _WORDS  -  Try  each  dip  in  peak  list  (recursively)  to  see  if  it 
starts  a  word 

**********************************w*************************************** 


(defun  peaks_to_words  (peakjist) 

(cond  ((null  peakjist)  nil) 

(t  (if  (>=  (third  (first  peak_list))  *min_peak*) 
(findwords  (first  peakjist)  (rest  peakjist))) 
(peaksJo_words  (rest  peakjist))))) 


FINDWORDS  -  Find  all  possible  words  in  peak  list  starting  with  given 
diphone  peak 

★★************************************************************************ 


(defun  findwords  (start_peak  rest_peak_list) 

(cond  ((equal  (first_phn  start_peak)  'HX) 

;  if  detected  silence  at  start  of  word,  chain  from  start_peak 
(setf  *word_candidates* 
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(append  *word_candidates* 

(build_words_l  start_peak  rest_peakjist)))) 

;  else  look  for  words  starting  w/2nd  phn  in  start_peak  (no  word  gap) 
(t  (setf  *word_candidates* 

(append  *word_candidates* 

(build_words_l  start_jDeak  rest_peak_list))) 

;  and  look  for  words  starting  with  1st  phn  in  start_peak 
(setf  *word_candidates* 

(append  *word_candidates* 

(build_words_ll  start_peak  rest_peakjist))) 

;  and  words  with  2nd  phn  =  1st  phn  in  start_peak  (a  "splice") 

(setf  *word_candidates* 

(append  *word_candidates* 

(build_wordsJII  start_peak  restjaeakjist)))))) 

BUILD_WORDS_l  -  Look  for  all  words  in  peak  list  starting  from  a 
silence-first  dip 

,**★**★**★*★*★*★*★***★**★★★★★*★*★*****★★★******★******★***★**★*★****★****** 


(defun  build_words_l  (start_peak  rest .peaks) 

(let  ((file_pos  (index_pos  (second_phn  start_peak)  *phonemeJndex*)) 
(word_entry  nil) 

(nw  nil) 

(rl  nil)) 

;  position  lexicon  file  pointer  at  1st  word  starting  with  1st  phn 
(cond  (file_pos 

(file-position  *dict_file*  file_pos) 

(setf  word_entry  (read  *dict_file*  nil  nil  nil)) 

(loop  ;  attempt  to  confirm  each  word  starting  with  1st  phn 
(when  (not  (equal  (second  word_entry)  (second_phn  start_peak))) 
(return  rl))  ;  exit  when  no  more  words  In  dictionary 
;  start  with  this  phn 

;  (if  (not_already_found  word_entry  *word_candidates*  start_peak) 
(setf  nw  (build_word  (first  word_entry)  ;  word  candidate 
(list  (second  word_entry)) ;  phns  found  so  far 
(rest  (rest  word_entry))  ;  phns  yet  to  be  found 
(list  start_peak)  ;  peaks  used  so  far 
rest_peaks))  ;  rest  of  peak  list 
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(setf  nw  nil)) 


(cond  ((and  (not  (null  nw))  (>=  (confidencejn  nw)  *min_conf*)) 
(setf  rl  (append  rl  (list  (print  nw)))) 

(setf  *last_peaks*  (second  nw)) 

(setf  *last_phns*  (rest  word_entry)) 

(setf  *failed_phn*  nil)) 

((not  (null  nw)) 

(setf  *last_peaks*  (second  nw)) 

(setf  *last_phns*  (rest  word_entry)) 

(setf  *failed_phn*  nil))) 

(setf  word_entry  (read  *dict_file*  nil  nil  nil))))))) 

BUILD_WORDS_ll  -  Look  for  all  words  in  peak  list  starting  with  a  given 
phoneme 


(defun  build_words_ll  (start_peak  rest_peaks) 

(let*  ((word_entry  nil) 

(phni  (first_phn  start_peak)) 

(phn2  (second_phn  start_peak)) 

(file_pos  (index_pos  phn1  *phoneme_index*)) 

(nw  nil) 

(rl  nil)) 

;  position  lexicon  file  pointer  at  1st  word  starting  with  1st  phn 
(cond  (filejDos 

(file-position  *dict_file*  file_pos) 

(setf  word_entry  (read  *dict_file*  nil  nil  nil)) 

(loop  ;  scan  fonward  in  dictionary  till  2nd  phn  of  word  =  2nd  phn  of  dip 

(when  (or  (equal  (third  word_entry)  phn2) 

(not  (equal  (second  word_entry)  phni))) 

(return  nil)) 

(setf  word_entry  (read  *dict_file*  nil  nil  nil))) 

(loop  ;  attempt  to  confirm  each  word  starting  with  1st  phn 
(when  (not  (equal  (third  word_entry)  phn2)) 

(return  rl))  ;  exit  when  no  more  words  in  dictionary 
;  start  with  these  2  phns 

;  The  next  if  clause  is  commented  out  because  it  turned  out  to  be  very 
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;  expensive  to  determine  of  the  same  instance  of  the  same  word  was  already 
in 

;  the  output  list.  Seems  better  to  just  process  the  word  &  let  higher  levels 
;  of  analysis  sort  out  the  duplications 

;  (if  (not_already_found  word_entry  *word_candidates*  start_peak) 

(setf  nw  (build_word  (first  word_entry)  ;  word  candidate 
(list  phn1  phn2)  ;  phns  found  so  far 
(rest  (rest  (rest  word_entry))) ;  phns  yet  to  be  found 
(list  start_peak)  ;  peaks  used  so  far 
rest_peaks))  ;  rest  of  peak  list 

;  (setf  nw  nil)) 

(cond  ((and  (not  (null  nw))  (>=  (confidencejn  nw)  *min_conf*)) 

(setf  rl  (append  rl  (list  (print  nw)))) 

(setf  *last_peaks*  (second  nw)) 

(setf  *last_phns*  (rest  word_entry)) 

(setf  *failed_phn*  nil)) 

((not  (null  nw)) 

(setf  *last_peaks*  (second  nw)) 

(setf  *last_phns*  (rest  word_entry)) 

(setf  ‘failed  _phn*  nil))) 

(setf  word_entry  (read  *dict_file*  nil  nil  nil)))) 

(t  nil)))) 


BUILD_WORDS_lll  -  Look  for  all  words  in  peak  list  with  a  given  2nd 
phoneme 

w************************************************************************* 


(defun  build_words_lll  (start_jDeak  rest_peaks) 

(let*  ((word_entry  nil) 

(phn1  (first_45hn  start_peak)) 

(phn2  (second_phn  start_peak)) 

(file_pos  (index_pos  phn1  *phoneme_lndex2*)) 

(nw  nil) 

(rl  nil)) 

;  position  lexicon  file  pointer  at  1  st  word  with  given  2nd  phn 
(cond  (flle_pos 

(file-position  *dict_file2*  file_pos) 

(setf  word_entry  (read  *dict_file2*  nil  nil  nil)) 

(loop  ;  scan  forward  in  dictionary  till  3rd  phn  of  word  =  3rd  phn  of  dip 
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(when  (or  (equal  (fourth  word_entry)  phn2) 
(not  (equal  (third  word_entry)  phn1))) 
(return)) 

(setf  word_entry  (read  *dict_file2*  nil  nil  nil))) 


(loop  ;  attempt  to  confirm  each  word  starting  with  1st  phn 
(when  (or  (not  (equal  (fourth  word_entry)  phn2)) 

(not  (equal  (third  word_entry)  phni))) 

(return  rl))  ;  exit  when  no  more  words  in  dictionary 
;  have  these  2  phns  in  2nd  &  3rd  pos 


I 


far 


(if  (not_already_found  word_entry  *word_candidates*  startjjeak) 

(setf  nw  (build_word  (first  word_entry)  ;  word  candidate 

(list  (second  word_entry)  phni  phn2)  ;  phns  found  so 


(nthcdr  3  word_entry)  ;  phns  yet  to  be  found 

;  peaks  used  so  far 

(list  (dummy_peak  (second  word_entry)  phni) 

start_peak) 

rest_peaks))  ;  rest  of  peak  list 


(setf  nw  nil)) 

(cond  ((and  (not  (null  nw))  (>=  (confidencejn  nw)  *min_conf*)) 
(setf  rl  (append  rl  (list  (print  nw)))) 

(setf  *last_peaks*  (second  nw)) 

(setf  *last_phns*  (rest  word_entry)) 

(setf  *failed_phn*  nil)) 

((not  (null  nw)) 

(setf  *last_peaks*  (second  nw)) 

(setf  *last_phns*  (rest  word_entry)) 

(setf  *failed_phn*  nil))) 

(setf  word_entry  (read  *dict_file2*  nil  nil  nil)))) 

(t  nil)))) 


**★★*★**★★****★★*★*★★**★***★★*★★★**********★*★★★★*★★*★★***★***★**★******** 

BUILD_WORD  -  Take  advantage  of  phns  &  peaks  previously  found  (and  needed 

by  this  word)  before  looking  for  rest  of  phonemes  required 
★************************************************************************* 


(defun  build_word  (test_word  phnsjound  phns_needed  peaks_used  rest_peaks) 
(let  ((phns_found2  phns_found) 

(phns_needed2  phns_needed) 

(peaks_used2  peaks_used) 

(Iast_phns2  *last_phns*) 

(Iast_peaks2  *last_peaks*) 
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(recycled_phns  nil) 

(matches  0)) 

(cond  ;  if  this  word  started  at  same  peak  as  last 
((equal  (first  peaks_used)  (first  *last_peaks*)) 

(dolist  (phn  phns_found)  ;  see  if  phns  so  far  match  phns  from  last  word 
(when  (not  (equal  phn  (first  Iast_phns2)))  (return)) 

(setf  Iast_phns2  (rest  Iast_phns2)) 

(setf  matches  (+  matches  1))) 

;  if  phns  match  so  far,  look  for  more  usable  phns  in  last  word 
(cond  ((equal  matches  (length  phns_found)) 

(dolist  (phn  Iast_phns2)  ;  recycle  phns  already  proven  In  last  word  search 
(when  (not  (equal  phn  (first  phns_needed2)))  (return)) 

(setf  phns_needed2  (rest  phns_needed2)) 

(setf  phns_found2  (append  phns_found2  (list  phn))) 

(setf  recycled_phns  (append  recycled_phns  (list  phn)))) 

(dolist  (peak  peaks_used) ;  skip  over  peaks  already  listed 
(if  (equal  (first  Iast_peaks2)  peak) 

(setf  Iast_peaks2  (rest  last_4Deaks2)))) 

(setf  peaks_used2  ;  move  approp  peaks  from  last  list  to  this  one 
(recycle_peaks  Iast_peaks2 
recycled_phns 
phns_needed2 
peaks_used2)) 

;  don't  use  phn  not  found  in  peaks  used 
(cond  ((and  (not  (equal  (flrst_phn  (first  (last  peaks_used2))) 

(first  (last  phns_found2)))) 

(not  (equal  (second_phn  (first  (last  peaks_used2))) 

(first  (last  phns_found2))))) 

(setf  phns_needed2  (append  (last  phns_found2) 

phns_needed2)) 

(setf  phns_found2  (butlast  phns_found2)))) 

;  fail  this  word  if  it  depends  on  same  phn  that  failed  last  word 
(if  (and  (equal  recycled_phns  Iast_phns2) 

(equal  (first  phns_needed2)  *failed_phn*) 

(not  (null  phns_needed2))) 
nil 

(build_word2  test_word  ;  else  look  for  rest  of  word 
phns_found2 
phns_needed2 
peaks_used2 

(if  (>  (length  peaks_used2) 
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(length  peaks_usecl)) 

(alLafter  rest_peaks  (first  (last  peaks_used2))) 
rest_peaks)))) 

;  no  matches  with  last  word  so  start  from  scratch 

(t  (build_word2  test_word  phns_found  phns_needed  peaks_used 

rest_peaks)))) 

(t ;  this  peak  is  a  new  starting  point  in  peaks  so  don't  bother  trying  to  recycle 
(build_word2  test_word  phns_found  phns_needed  peaks_used  rest_peaks))))) 

;  BUILD_WORD2  -  Attempt  (recursively)  to  find  phonemes  of  given  word  in 
;  peak  list  called  by  build_wordsJ,  build_wordsJI,  and  build_wordsJII 
;  through  build_word. 


(defun  build_word2  (test_word  phns_found  phns_needed  peaks_used  rest_peaks) 
(let*  ((link  (link_dip  peaks_used  phns_needed  rest_peaks)) 

(peak_used  (first  link)) 

(rest_peaks2  (second  link)) 

(need  nil)) 

(setf  *last_peaks*  peaks_used) ;  save  results  so  far  to  avoid  search  for 
(setf  *last_phns*  phns_found)  ;  same  phns  again  on  similar  word 
(setf  *failed_phn*  (first  (last  phns_needed))) 

(cond  ((null  phns_needed)  ;  success  if  no  more  phonemes  to  find 
(list  test_word  peaks_used  (confidence_val  peaks_used))) 

((null  rest_peaks)  nil)  ;  failure  if  no  more  peaks  to  try 

(peak_used  ;  if  there  is  a  nearby  peak  with  next  phn 

(better_of  (build_word2  test_word  ;  try  to  chain  from  this  link 
(update_found  phns_found  peak_used) 

(setf  need  (update_needed  phns_needed  peak_used)) 
(append  peaks_used  (list  peak_used)) 
rest_peaks2) 

(build_word2  test_word  ;  if  one  link  doen’t  work,  try  another 
phns_found 
phns_needed 
peaks_used 
rest_peaks2))) 

;  else  if  link  to  this  phn  not  found  but  last  peak  was  real,  try  splice 
;  ((reaLpeak  (car  (last  peaks_used))) 
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(build_word2  test_word 

(append  phns_found  (list  (first  phns_needed))) 
(rest  phns_needed) 

(append  peaks_used 

(list  (dummy_peak  (car  (last  phnsjound)) 
(first  phns_needed)))) 
(skip_early  rest_peaks 

(first  (first  (last  peaks_used))) 

(list  (first  (last  phns_found)) 

(first  phns_needed) 

(second  phns_needed))))) 

(t  nil))))  ;  else  word  not  found 

,  ★******★***★***★****★****★*****★★*****★****★★****★★*★**★******★**★★**** 
, *********************************************************************** 


;  SUPPORT  FUNCTIONS  FOR  TURNING  DIPHONE  PEAKS  INTO  WORD 
CANDIDATES 

.***★*★***★***★★*★**★**★*★**★***★*********★****★*★*★*★**********★*★★★★★★ 

,★***★★**★******★*★*★★******★★**★**★★**★★*★*★*★**★********★★***★*★★***** 


LINK_DIP  -  Return  a  diphone  that  supplies  needed  phoneme  (in  1st  or  2nd 
pos) 

*********★★****★******★*****★*★*★*****★****★***★*★**★****★****★*******★★** 


(defun  link_dip  (peaksjist  phns_needed  rest_peaks) 

(let*  ((last_peak  (first  (last  peaksjist))) 

(rest_peaks2  rest_peaks) 

(min_span_start  (first  last_peak)) 

(max_span_start  min_span_start) 

(last_phn  (second_phn  last_peak)) 

(phn_needed  (first  phns_needed)) 

(second_phn_needed  (second  phns_needed)) 
(maxjast_phn_dur  (max_dur  last _phn)) 
(max_phn_needed_dur  (max_dur  phn_needed)) 
(minjast_phn_dur  (min_dur  last_phn)) 

(min_phn_needed_dur  (min_dur  phn_needed)) 
(max_both_durs  (+  maxjast_phn_dur  max_phn_needed_dur)) 
(min_both_durs  (+  min_last_phn_dur  min_phn_needed_dur)) 
(only_need_one  (equal  (length  phns_needed)  1))) 

(cond 

((null  rest_peaks)  nil) 
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(t  (cond  ((<  min_span_stai1 0)  ;  if  last  peak  was  dummy  peak 

(setf  min_span_start  ;  estimate  missing  start  pos 
(-  (first  (first  rest_peaks)) 

*win_rate*)) 

(setf  max_span_start  (first  (first  rest_peaks))))) 

(doiist  (peak  rest_peaks) 

(let  ((min_span  {-  (first  peak)  max_span_start)) 

(max_span  (-  (first  peak)  min_span_start)) 

(first_phn_peak  (first_phn  peak)) 

(second_phn_peak  (second_phn  peak))) 

(setf  rest_peaks2  (rest  rest_peaks2)) 

(cond  ((>  min_span  (+  max_iast_phn_dur  max__phn_needed_dur)) 

(return  nil))  ;  if  gone  too  far,  return  nil 

((and  (equal  first_phn_peak  last_phn)  ;peak's  phn1=last  phn2 

(equal  second_phn_peak  phn_needed) ;  peak's  phn2=phn  needed 
(<=  min_span  maxjast_phn_dur)  ;  if  timing  looks  right 
(>=  max_span  min_last_phn_dur)) 

(return  (iist  peak  rest_peaks2))) 

((and  (equal  first_phn_peak  phn_needed) ;  peak's  phn1=phn  needed 
(<=  min_span  max_both_durs)  ;  if  timing  looks  right 
(>=  max_span  min_both_durs) 

;  and  this  is  last  phn  needed  or  both  phns  match 
(or  only_need_one 

(equal  second_phn_peak  second_phn_needed))) 

(return  (list  peak  rest_peaks2))) 

(t  nil)))))))) 


9 

;  NOT_ALREADY_FOUND  -  Return  nil  if  current  peak  starts  a  word  that  was 
;  already  found;  else  T 

.*****★************★**★*★★******★****************************************** 

> 

(defun  not_already_found  (word_entry  wordjist  start_peak) 

(let  ((word  (first  word_entry)) 

(word__phn1  (second  word_entry)) 

(word_phn2  (third  word_entry)) 

(peak_phn1  (first_phn  start_peak)) 

(last_word_start  (first  (first  (second  (first  (last  word_list)))))) 

(peak_start  (first  start jDeak))) 

(cond  ((null  wordjist)  t) 
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((and  (equal  peak_phn1  word _phn2)  ;  use  duration  of  both  phns 
(<  last_word_start 
(-  peak_start 
(+  (max_durword_phn1) 

(max_dur  word_phn2)))) 

(>  last_word_start 

0)) 

t)  ;  we  have  looked  far  enough  back,  so  not  found  (T) 

((and  (equal  peak_phn1  word_phn1)  ;  only  one  phn  duration  counts 
(<  last_word_start 
(-  peak_start 
(max_dur  word_phn1 ))) 

(>  last_word_start 

0)) 

t)  ;  we  have  looked  far  enough  back,  so  not  found  (T) 

((and  (equal  (first  (first  (last  word_list)))  word)) ;  same  word 
nil)  ;  this  start_peak  is  part  of  a  word  already  found 
(t  (not_already_found  word_entry  (butlast  word_list)  start_peak))))) 


:  DUMMY_PEAK  -  Return  a  dummy  peak  containing  desired  phonemes  (&  low 
confidence).  Negative  start  sample  number  is  used  to  flag  peak  as 
a  dummy  peak. 


(defun  dummy_peak  (phni  phn2) 

(list  -1  (list  phn1  phn2)  *dummy_confidence*)) 

J 

;  REAL_PEAK  -  Return  T  if  given  peak  is  not  a  dummy  peak  (else  nil) 
.************************************************************************** 


(defun  reaLpeak  (peak)  ;  returns  true  if  peak  is  real  vice  dummy 
(>=  (first  peak)  0))  ;  dummy  peaks  have  negative  sample  number 

1 

;  RECYCLE_PEAKS  -  If  last  word  search  located  phonemes  useful  for  this 
;  word,  add  appropriate  peaks  from  last  search  to  this  word's  peak  list 

.A************************************************************************** 


(defun  recycle_peaks  (last_peaks  phns_recycled  phns_needed  peaks_used) 
(let  ((Iast_peaks2  last_peaks) 

(recycled_phns  phns_recycled) 
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(peaks_used2  peaks_used)) 


phn 


list 


(loop  ;  add  useful  peaks  previously  found 
(cond  ((null  Iast_peaks2)  (return)) 

((null  recycled_phns)  (return)) 

((and  (equal  (first_phn  (first  Iast_peaks2))  ;pk  phn  1  = 

(first  recycled_phns))  ;next  needed  phn 

(or  (and  (null  (second  recycled_phns))  ;  no  more  to  recycle 
(or  (null  (first  phns_needed)) ;  &  no  more  needed 
(equal  (first  phns_needed) ;  or  1st  phn  needed  = 

(second_phn  (first  Iast_peaks2)))))  ;  peak's  2nd 

(equal  (second_phn  (first  Iast_peaks2))  ;pk  phn  2 
(second  recycled_phns)))) 

(setf  peaks_used2  (append  peaks_used2 

(list  (first  Iast_peaks2)))) 

(setf  Iast_peaks2  (rest  Iast_peaks2)) 

(setf  recycled_phns  (rest  (rest  recycled_phns)))) 

((and  (equal  (first_phn  (first  Iast_peaks2))  ;pk  phn1= 

(second_phn  (first  (last  peaks_used2))))  ;last  phn  in  peak 

(equal  (second_phn  (first  Iast_peaks2))  ;pk  phn  2= 

(first  recycled_phns)))  ;  next  needed  phn 

(setf  peaks_used2  (append  peaks_used2 

(list  (first  Iast_peaks2)))) 

(setf  Iast_peaks2  (rest  Iast_peaks2)) 

(setf  recycled_phns  (rest  recycled_phns))) 

(t  (return)))) 

peaks_used2)) 


ALL_AFTER  -  Return  portion  of  peak  list  after  given  peak 

*************************************<r************************************ 


(defun  alLafter  (rest_peaks  last_peak_used) 
(cond  ((null  rest_peaks)  nil) 

((null  last_peak_used)  rest_peaks) 

((equal  (first  rest_peaks)  last_peak_used) 
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(rest  rest_peaks)) 

(t  (alLafter  (rest  rest_peaks)  last_peak_used)))) 

************************************************************************** 

SKIP_EARLY  -  Trim  peaks  from  peak  list  that  are  too  close  to  start  pos 
to  contain  needed  phonemes 

************************<r************************************************* 


(defun  skip_early  (rest_peaks  start_pos  phns) 

(let  ((dur_sum  0) 

(locjist  rest_peaks) 

(limit  nil)) 

(dolist  (phn  phns) 

(setf  dur_sum  (+  dur_sum  (min_dur  phn)))) 

(setf  limit  (+  start_pos  dur_sum)) 

(loop 

(cond  ((null  locjist) 

(return  nil)) 

((<  (first  (first  loc_list))  limit) 

(setf  locjist  (rest  locjist))) 

(t  (return  locjist)))))) 

,************************************************************************** 

9 

:  UPDATE_FOUND  -  Add  1  or  2  phns  to  found  list  (depending  on  link  type) 
,  ★**★*★***★*★**★★★★*★*★****★****★*★★*****★**♦★**★****************★********* 


(defun  update_found  (phnsjound  peak_used)  ;  if  peak  phni  seen  before 
(if  (equal  (first_phn  peak_used)  (first  (last  phnsjound))) 

(append  phnsjound  (rest  (second  peak_used)))  ;  only  add  1  phn 
(append  phnsjound  (second  peak_used))))  ;  else  add  both  phns 

;  UPDATE_NEEDED  -  Remove  1  or  2  phns  from  phn  needed  list  (depending  on 
:  link  type) 


(defun  update_needed  (phns_needed  peak_used)  ;  if  peak  phn1=1st  needed 
(if  (equal  (first_phn  peak_used)  (first  phns_needed)) 

(rest  (rest  phns_needed))  ;  remove  both  phns 
(rest  phns_needed)))  ;  else  only  remove  1  phn 


244 


,**★★★*★★★*★★*★*★***★★★**★***★*★*★★★***★********★*★★★★★★★★***★*★★*★★*★★★**★ 

:  MIN_DUR  -  Return  minimum  expected  duration  of  given  phoneme  (in  samples) 
,********★*★★*★***★★★**★*★★**★**★*★****★★★****★**★*★******★★★******★**★★*★* 


(defun  min_dur  (phn) 

(let  ((durjist  (index_pos  phn  *phn_durs*))) 

(if  (null  (first  durjist)) 

0 

(first  durjist)))) 

;  MAX_DUR  -  Return  maximum  expected  duration  of  given  phoneme  (in  samples) 


(defun  max_dur  (phn) 

(let  ((durjist  (index_pos  phn  *phn_durs*))) 

(if  (null  (second  durjist)) 

0 

(second  durjist)))) 

,*******★*★★**★★*******★★*******★****★****★***★*****★******************★*** 

> 

;  BETTER_OF  -  Return  word  with  higher  confidence  level  based  on  peaks  used 


(defun  better_of  (word1  word2) 

(if  (>  (confidencejn  word1)  (confidencejn  word2)) 
word1 
word2)) 


CONFIDENCE_VAL  -  Return  confidence  level  based  on  peaks  used  to  build 
word 


(defun  confidence_val  (peaks) 

(let  ((conf_sum  0) 

(count  0)) 

(dolist  (peak  peaks) 

(setf  count  (+  1  count)) 

(setf  conf_sum  (+  conf_sum  (third  peak)))) 
(if  (>  count  0) 

(/  conf_sum  count) 

0))) 


245 


,************************************************************************** 


;  CONFIDENCEJN  -  Return  confidence  level  stored  in  word's  list 

i 

(defun  confidencejn  (word) 

(if  (null  word) 

0 

(third  word))) 

,**4r4rW*WW4r4r4r*****W***4r***W***4r*4r*illr*4nllr*WW*ifr4rW4r**4r*4r*W**4r******WW*W*W******** 

;  FIRST_PHN  -  Return  first  phoneme  in  given  peak's  phn  list 
,*****★*********★***********************★****★**********★***★************** 

(defun  firstjDhn  (peak) 

(if  (null  peak) 
nil 

(first  (second  peak)))) 

.**★**★**★★**★★**★★★*★★★★★★*★**★★*★***★★**********★**★★★***★**★******★★*★*★ 

;  SECOND_PHN  -  Return  first  phoneme  in  given  peak's  phn  list 
.★*★****★★*★★★*★*★*★★★★*★★★**★★***★*★**★*★★★****★****★*★**★★★************** 

(defun  second_phn  (peak) 

(if  (null  peak) 
nil 

(second  (second  peak)))) 

,★******★****★★★*★★★****★★★*★*★*★*★*★*★★*******★★★*★★★**★*************** 

.**********************W*************************‘*********************** 

;  SORTING,  LISTING,  &  I/O  RELATED  CODE 

;  SORT_WORDS  -  Sort  the  word  candidate  list  by  sample  number  and  then 
;  by  confidence  level 


(defun  sort_words  (word_candidates) 

(let  ((result  nil) 

(wl  nil)) 

(dolist  (word  word_candidates)  ;  start/stop  sample  numbers 

(setf  wl  (list  (list  (start_samp  word) 

(end_samp  word)) 
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(first  word)  ;  word 

(confidencejn  word)))  ;  confidence 
(setf  result  (append  result  (list  wl)))) 

(setf  result  (sort  result  #'more_conf)) 

(setf  result  (stable-sort  result  #'before)))) 


BEFORE  -  Returns  true  if  wrd1  has  lower  sample  number  than  wrd2  (for 
sort) 

★★************************************************************************ 


(defun  before  (wrd1  wrd2) 

(if  (<  (first  (first  wrdi))  (first  (first  wrd2))) 
t 

nil)) 


,W4r4r*4r4r**4r*4rifr*4r*4r*4rW****W4r***4r4r***W****4r*****4r*WW**4r*4r4r**4r4rW*W*******W***** 

;  MORE_CONF  -  Boolean  function  comparing  word  confidences  (for  sort) 
,************************************************************************** 


(defun  more_conf  (w1  w2) 
(if  (>  (third  w1)  (third  w2)) 
t 

nil)) 


************************************************************************** 

WRITE_WORDS  -  Write  word  candidates  to  designated  file  handle 
★*★****★**★★*★***★**★***★*★*★*★★*****★*★**★****★**★*★**★**★★***★***★*****★ 


(defun  write_words  (wordjist  Lhandle) 
(dolist  (word  wordjist) 

(format  Lhandle "~%  ~S"  word) 

(print  word)) 
t) 


.★**********************************************************************^^* 

I 

;  LIST_WORDS  -  Sort  and  display  list  of  word  candidates 

.*****************i#f**************************************************^^*^^^ 


(defun  lisLwords  () 

(dolist  (word  (sort_words  *word_candidates*)) 
(format  t "~%  ~S"  word)) 
t) 


;  SAVE_WORDS  -  Save  word  list  to  designated  filename 
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.************************************************************************** 

1 


(defun  save_words  (fname) 

(let  ((out_file  (open  fname  :direction  :output 

:if-does-not-exlst  :create 
:if-exists  :rename))) 

(dolist  (word  (sort_words  *word_candidates*)) 
(format  out_flle  "-%  ~S“  word) 

(print  word)) 

(close  out_file)) 
t) 


,*********************************************************************** 

:  LIST_TO_FILE  -  Save  contents  of  a  list  to  a  file  (one  item  per  line) 
,  *★★★**★*★*****★*******★****★**★★★**★*****************★★*********★****★* 


(defun  listjojile  (object_name  fname) 

(let  ((Lhandle  (open  fname  idirection  :output 

:if-does-not-exist  :create 
:if-exists  :append))) 
(dolist  (item  object_name) 

(format  Lhandle  "~S~%"  item) 

) 

(close  Lhandle) 

) 

) 


*★***★******★★★★★★*★*****:****★★★*★★*★**★★*******★★★★*★★**★★*★**★★*★**★★*** 

START_SAMP  -  Return  starting  sample  number  for  a  word  entry 
*★******★*****★★**********★***★**★*****★★★★★★*★★**★★★******★★★★**★★**★*★★★ 


(defun  start_samp  (word) 

(if  (null  word)  0 

(if  (>=  (first  (first  (second  word)))  0) 

(first  (first  (second  word))) 

(first  (second  (second  word)))))) 

;  END_SAMP  -  Return  ending  sample  number  for  a  word  entry 


(defun  end_samp  (word) 

(max  (first  (first  (last  (second  word)))) 

(if  (butlast  (second  word)) 

(first  (first  (last  (butlast  (second  word))))) 

0))) 
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it*itie*itic’k'kicitiki€ie‘k*-kic*irititic*itit*'k‘kie-kititir'kitititit*iticic'k'kit’k*i(iti(‘kic**iritieit’k*-k'kie‘k*’k'kifkic 


WORD_ENTRY  -  Return  most  confident  entry  for  given  word  in 
word  candidate  list 


(defun  word_entry  (correct_word  wordjist) 

(cond  ((null  wordjist)  nil) 

((string-equal  (first  (first  wordjist))  correct_word) 
(better_of  (first  wordjist) 

(word_entry  correct_word  (rest  wordjist)))) 
(t  (word_entry  correct_word  (rest  wordjist))))) 


.*★************************★*********★********************************** 

> 

;  STATS  -  Return  list  of  statistics  reflecting  how  well  word  was 

;  recognized 

I 

(defun  stats  (correct_word  wordjist) 

(let*  ((cword  (word_entry  correct_word  wordjist)) 

(cword_conf  (confidencejn  cword)) 

(cwordjrank  1) 

(cword_rank  1 ) 

(best_w  nil) 

(best_conf  0) 

(lbest_w  nil) 

(lbest_conf  0)) 

(print  correct_word) 

(dolist  (w  wordjist) 

(cond  ((>  (confidencejn  w)  best_conf) 

(setf  best_conf  (confidencejn  w)) 

(setf  best_w  (first  w)))) 

(cond  ((>  (confidencejn  w)  cword_conf) 

(setf  cword_rank  (1+  cword_rank)))) 

(cond  ((equal  (start_samp  w) 

(start_samp  cword)) 

(if  (>  (confidencejn  w)  cword_conf) 

(setf  cwordjrank  (1+  cwordjrank))) 

(cond  ((>  (confidencejn  w)  lbest_con^ 

(setf  lbest_conf  (confidencejn  w)) 

(setf  lbest_w  (first  w))))))) 

(list  (start_samp  cword) 
correct_word 
cword_conf 
cwordjrank 
cword_rank 
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Ibest_w 

lbest_conf 

best_w 

best_conf))) 

» 

*************************************************************W********* 

TEST  CODE 

ic*it**iciticiticititiriciritiricir'kicic**iritie*iticii*ieit'kitieititie*1ticitieic'kitititit*itiritititicicit‘k*it'ic**ic1cir*ic 

**************★★★******★*★**★★★*★******★**★*******★***★★***★**★******** 

************************************************************************** 
RECOGNIZE  -  Open  lexicon  files  &  look  for  words  in  sample  peak  list 

★★w*********************************************************************** 


(defun  test  () 

(let  ((index_file  (open  "phnindex.txt'')) 

(index_file2  (open  "phnindx2.txt")) 

(phn_durfile  (open  "phndurs.txt")) 

(fname  "dips")) 

(setf  *dict_file*  (open  "lexicon.txt")) 

(setf  *dict_file2*  (open  "lexicon2.txt")) 

(setf  *phoneme_index*  (read  index_file  nil  nil  nil)) 
(setf  *phoneme_index2*  (read  index_file2  nil  nil  nil)) 
(setf  *phn_durs*  (read  phn_durfile  nil  nil  nil)) 
(dip2peak  fname) 

(setf  *word_candidates*  nil) 

(peaks_to_words  *diphone_peak_list*) 

(close  index_file) 

(close  index_file2) 

(close  phn_durfile) 

(close  *dlct_file*) 

(close  *dict_file2*))) 


TEST_WORDS  -  Open  designated  diphone  recognizer  (Matlab  script  file 
recogniz.m)  files  &  look  for  words.  Save  results  to  second  filename. 

iritititiciciticicititititicitit*iticieiticititieit*iciciriciiieitit1ric‘k’kit'kir'kie'kicicicitieit*it1t-kicir'kic*icit*-k‘kitic*ifitic'k'k’k 


(defun  recognize  (nnet_outputsname  out_fname) 
(let*  ((index_file  (open  "phnindex.txt")) 
(index_file2  (open  "phnindx2.txt")) 
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(phn_durfile  (open  "phndurs.txt")) 

(debug_fname  "recogniz.dbg") 

(reportjname  "recogrep.txt") 

(nnet_outputs  (open  nnet_outputsname)) 

(out_file  (open  out_fname  idirection  :output 

:if-does-not-exist  icreate 
:if-exists  :rename)) 

(reportjile  (open  reportjname  idirection  :output 

:if-does-not-exist  :create 
:if-exists  :append)) 

(correct_word  nil) 

(temp_out  nil) 

(obj_read  nil)) 

(self  *dict_file*  (open  "lexicon.txt")) 

(setf  *dlct_file2*  (open  "lexicon2.b(t'')) 

(setf  *phonemeJndex*  (read  index_file  nil  nil  nil)) 

(setf  *phonemeJndex2*  (read  index_file2  nil  nil  nil)) 

(setf  *phn_durs*  (read  phn_durfile  nil  nil  nil)) 

(loop 

(setf  obj_read  (read  nnet_outputs  nil  nil  nil)) 

(when  (null  obj_read)  (return)) 

(print  obj_read  out_file) 

(setf  correct_word  (second  obj_read)) 

(format  t  "~%~S:  Reading  NNET  outputs..."  correct_word) 

(setf  temp_out  (open  "temp.dip"  :direction  :output 

:if-does-not-exist  :create 
:if-exists  :rename-and-delete)) 

(loop 

(setf  obj_read  (read  nnet_outputs  nil  nil  nil)) 

(when  (equal  (first  obj_read)  'word_end)  (return)) 

(print  obj_read  temp_out)) 

(close  temp_out) 

(format  t  "~%~S:  Locating  output  peaks..."  correct_word) 

(dip2peak  "temp.dip") 

(setf  *word_candidates*  nil) 

(format  t  "~%~S:  Looking  for  words..."  correct_word) 

(peaks_to_words  *diphone_peak_list*) 

(format  t  "~%~S:  Saving  results  to  file..."  correct_word) 

(write_words  (sort_words  *word_candidates*)  out_file) 

(list_to_file  (cons  (list  correct_word)  *word_candidates*)  debugjname) 
(print  (stats  correct_word  *word_candidates*)  report_file)) 

(close  nnet_outputs) 

(close  ouMile) 

(close  index_file) 

(close  index_file2) 

(close  phn_durfile) 
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(close  report_file) 
(close  *dict_file*) 
(close  *clict_file2*))) 


C.  PEAKS.CL 


;*  RECOG.CL  -  This  program  converts  the  output  file  from  a  diphone 
:*  recognizer  into  a  string  of  phonemes  and  then  words.  The  following  "type 
;*  definitions"  will  be  used  in  program  comments  (each  is  implemented  as 
;*  a  LISP  list): 

;*  diphone  ::=  (phoneme  phoneme); 

■*  diphone_record  ::=  (sample_num  diphone  confidence_value) 

;*  three_best_diphones  ::=  (diphone_record  diphone_record  diphone_record) 

.* 

;*  The  diphone  recognizer  converts  a  PCM  voice  file  into  an  ASCII  file 
;*  of  three_best_diphones  (one  line  each).  Each  three  best  diphone 
represents  the  three  output  classifiers  with  the  highest  activation 
;*  level  at  the  time  represented  by  the  sample_num.  In  principal,  each 
;*  diphone  classifier's  confidence_value  should  peak  when  the  sample 
:*  window  is  perfectly  aligned  with  the  diphone.  So  this  program  tracks 
;*  each  diphone  from  the  time  it  first  appears  in  a  three_best_diphones 
;*  line  until  it  is  replaced  by  a  new  "winner."  Once  a  diphone  is 
;*  displaced,  its  peak  diphone_record  is  appended  to  the  *diphone_peakJist*. 

;*  The  diphone_peak_list  is  then  (time)  sorted  for  conversion  to  a  list  of 
;*  phonemes  by  the  peak2phn  function. 

;  globals 

(defvar  *current_best_diphones*  nil) 

(defvar  *new_best_diphones*  '(nil  nil  nil)) 

:  FOLLOWING  FUNCTIONS  TURN  DIPHONE  RECOGNIZER  OUTPUT  INTO 
SEQUENCE  ;  OF  DIPHONE  PEAKS; 

> 

;  DIP2PEAK  -  main  function  for  opening  ASCII  file  of  three_best_diphones 
;  and  converting  to  a  string  of  diphone  peaks 
(defun  dip2peak  (fname) 

(let  ((dip_data_file  (open  fname))) 

(setf  *current_best_diphones*  nil) 

(setf  *new_best_diphones*  '(nil  nil  nil)) 
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(setf  *diphone_peak_list*  nil) 

(loop 

(let  ((three_best_diphones  (read  dip_data_file  nil  nil  nil))) 

(if  (null  three_best_diphones)  (return) 

(record_diphones  three_best_diphones)))) 
(refresh_best_diphones) 

(stable-sort  *diphone_peak_list*  #'is_earlier) 

(close  dip_data_file))) 

;  IS_EARLIER  -  predicate  for  use  by  stable-sort 
(defun  is_earlier  (first_dip  second_dip) 

(if  (<  (first  first_dip)  (first  second_dip)) 
first_dlp 
nil)) 

;  RECORD_DIPHONES  -  Save  latest  three_best_diphones  in  appropriate  slot 
;  in  *new_best_diphones*,  shift  results  to  *current_best_diphones*, 

:  and  reinitialize  *new_best_diphones*  to  nils. 

(defun  record_diphones  (three_dipjist) 

(when  (not  (null  three_dipjist)) 

(record_dip  (first  three_dip_list)) 

(record_dip  (second  three_dip_list)) 

(record_dip  (third  three_dipjist)) 

(refresh_best_diphones) 

(setq  *new_best_diphones*  (list  nil  nil  nil)) 
t)) 

:  RECORD_DIP  -  Save  a  diphone_record  in  the  appropriate  slot  in 
;  in  *new_best_diphones*.  If  the  diphone_record  contains  same 
;  diphone  as  an  existing  *current_best_diphones*  entry,  save  in 
;  corresponding  slot  In  *new_best_diphones*.  Else  tack  it  on 
;  end  of  *new_best_diphones*.  REFRESH_BEST_DIPHONES  will  then 
:  know  (by  a  nil  in  one  of  first  3  slots)  that  one  of  the  diphones 
;  it  was  tracking  has  been  replaced. 

(defun  record_dip  (new_diphone_rec) 

(cond 

((equalp  (second  new_diphone_rec)  (second  (first  *current_best_diphones*))) 
(update_best_diphone  1  new_diphone_rec)) 

((equalp  (second  new_diphone_rec)  (second  (second  *current_best_diphones*))) 
(update_best_diphone  2  new_diphone_rec)) 

((equalp  (second  new_diphone_rec)  (second  (third  *current_best_diphones*))) 
(update_best_diphone  3  new_diphone_rec)) 

(t  (new_best_diphone  new_diphone_rec)) )) 

;  BEST_OF  -  Used  by  UPDATE_BEST_DIPHONE  to  determine  which  of  2  diphones 
;  has  a  higher  confidence_value. 


253 


(defun  best_of  (first_dip  second_dip) 

(cond  ((null  first_dip)  second_dip) 

((null  second_dip)  first_dip) 

((>  (car  (last  first_dip))  (car  (last  second_dip)))  first_dip) 

(second_dip)  )) 

;  UPDATE_BEST_DIPHONE  -  Used  by  RECORD_DIP  to  put  best  of  old  &  new  dips 
;  in  the  appropriate  slot  of  *new_best_diphones*  list. 

(defun  update_best_diphone  (best_dip_number  new_best_dip_rec) 

(case  best_dip_number 
(1  (setf  (first  *new_best_diphones*) 

(best_of  new_best_dip_rec  (first  *current_best_diphones*)) )) 

(2  (setf  (second  *new_best_diphones*) 

(best_of  new_best_dip_rec  (second  *current_best_diphones*))  )) 

(3  (setf  (third  *new_best_diphones*) 

(best_of  new_best_dip_rec  (third  *current_best_diphones*)) )) 

(otherwise  nil)  )) 


;  NEW_BEST_DIPHONE  -  Used  by  RECORD_DIP  to  append  a  new  diphone_record 
;  on  end  of  *new_best_diphones*  for  future  tracking.  This  will  leave 
;  a  nil  in  one  of  first  3  slots  of  *new_best_diphones*  which  will,  in 
;  turn,  trigger  recording  of  the  peak  value  for  the  diphone  that  dropped 
;  off  the  "winners"  list. 

(defun  new_best_diphone  (new_best_dip_rec) 

(setf  *new_best_diphones* 

(append  *new_best_diphones*  (list  new_best_dip_rec))  )) 


;  REFRESH_BEST_DIPHONES  -  Used  by  RECORD_DIPHONES  to  move  the  lastest 
;  "winning”  diphones  from  *new_best_diphones*  to  *current_best_diphones*. 

;  But  first,  any  nils  in  first  3  slots  of  *new_best_diphones*  (meaning 
;  a  diphone  has  dropped  off  the  "winners  list")  trigger  recording  of 
;  diphone_record  in  corresponding  slot  of  *current_best_diphones*  to 
;  *diphone_peakJist*  (at  any  time,  *current_best_diphones*  has  most 
;  confident  [peak]  value  for  3  diphones  being  tracked). 

(defun  refresh_best_diphones  () 

(if  (null  (first  *new_best_diphones*)) 

(record_diphone_peak  (first  *current_best_diphones*))) 

(if  (null  (second  *new_best_diphones*)) 

(record_diphone_peak  (second  *current_best_diphones*))) 

(If  (null  (third  *new_best_diphones*)) 

(record_diphone_peak  (third  *current_best_diphones*))) 

(setf  *new_best_diphones*  (remove_nils  *new_best_diphones*)) 

(setf  *current_best_diphones*  *new_best_diphones*) 
t) 
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;  RECORD_DIPHONE_PEAK  -  Append  a  peak  diphone_record  to  *diphone_peak_list. 
(defun  record_diphone_peak  (latest_diphone_peak) 

(when  (not  (null  latest_diphone_peak)) 

(setf  *diphone_peakJist* 

(append  *diphone_peak_list*  (list  latest_diphone_peak))) 
t  )) 


;  REMOVE_NILS  -  Used  by  REFRESH_BEST  DIPHONES 
(defun  remove_nils  (nillyjist) 

(if  (not  (null  (first  nillyjist))) 

(append  (list  (first  nillyjist))  (remove_nils  (rest  nillyjist))) 
(if  (not  (null  (rest  nillyjist))) 

(remove_nils  (rest  nilly_list))  )  )) 

(defun  testfile  (fname) 

(dip2peak  fname) 

*diphone_peakJist*) 

(defun  test_peaks  () 

(format  t "~%  DIPHONE  PEAKS:~%") 

(print  (testfile  “dips")) 

(read-line) 

(format  t FINAL  DIPHONE  SEQUENCE:~%") 
(print  *diphone_peakJist*)) 


D.  DICT.CL 

;  PHONETIC  DICTIONARY-RELATED  FUNCTIONS 

.★*****★**★******★**★***★***★★*********★******************************** 

.★********************************************************************** 

I 

,*********************************************************************** 

I 

(defun  write Jile  (object_name  fname) 

(let  ((f_handle  (open  fname  :direction  :output))) 

(print  object_name  f_handle) 

(close  Lhandle)  )) 
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■***★********★**★***★★*★★★★***★★★*******★★*★*★★★★**★**★**★★★**★★**★**★**★★ 


;  CREATE_PHN_DICT  -  Reads  the  standard  TIMIT  phonetic  dictionary  and 

;  converts  each  entry  into  a  LISP  list.  Phonemes  are  placed 

;  first  to  allow  sorting  of  entire  dictionary  by  phoneme  vice  by  word. 
,*********************************************************************** 

I 

(defun  create_phn_dict  (in_fname  out_fname) 

(let  ((lexicon_file  (open  in_fname)) 

(output_file  (open  outjname  ;direction  :output)) 

(word_read  nil)  ) 

(format  outputjile "(") 

(loop 

(setf  word_read  (read_word  lexiconjile)) 

(if  (null  word_read) 

(return) 

(if  (>  (length  word_read)  2) 

(print  (append  (rest  word_read) 

(list  (first  word_read))) 
output_file  )))) 

(format  output_file ")") 

(close  lexicon_file) 

(close  output_file))) 


.*****★*★*****★****★★****★★***********★****★******★************★*★**★*** 

} 

.*******★*★*★****★★★★★****★*★★***********★**★******★*★★★********★***★*** 

> 


;  PUT_WORDS_FIRST  -  Returns  phonetic  dictionary  to  word  first  format 
;  after  sorting  by  phonemes. 

(defun  put_words_first  (in_fname  out_fname) 

(let  ((phn_dict_file  (open  in_fname)) 

(output_file  (open  out_fname  rdirection  :output)) 

(obj_read  nil)) 

(setf  obj_read  (read  phn_dict_file  nil  nil  nil)) 

(if  (not  (null  obj_read))  (file-position  phn_dict_file  istart)) 

(loop 

(setf  obj_read  (read  phn_dict_file  nil  nil  nil)) 

(cond  ((null  obj_read)  (return)) 

(t  (print  (append  (last  obj_read) 

(butlast  obj_read))  output_file)))) 

(close  phn_dict_file) 

(close  output_file)  )) 

.*********<«r*****iAr4r*4r4r4r**^**4r*iAr**iflr*4r*4r*4r4r4r*4r****illr4r4r4r*4r4r***4r*4rikr'Ar4r****«***‘**** 

3 

;  CREATE_PHN_DICT2  -  Reads  the  standard  TIMIT  phonetic  dictionary  and 
;  converts  each  entry  into  a  LISP  list.  Phonemes  are  placed 
;  first  to  allow  sorting  of  entire  dictionary  by  phoneme  vice  by  word. 
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» 


(defun  create_phn_dict2  (in_fname  out_fname) 

(let  ((lexicon_file  (open  injname)) 

(output_flle  (open  outjname  :direction  :output)) 
(word_read  nil) ) 

(format  outputjile "(") 

(loop 

(sett  word_read  (read_word  lexiconjile)) 

(if  (null  word_read) 

(return) 

(if  (>  (length  word_read)  2) 

(print  (append  (rest  (rest  word_read)) 

(list  (first  word_read)  (second  word_read)) ) 
output_file)))) 

(format  outputjile ")") 

(close  lexicon  Jile) 

(close  outputjile))) 

) 

I 


:  PUT_WORDS_FIRST2  -  Returns  phonetic  dictionary  to  word  first  format 
;  after  sorting  by  phonemes. 

(defun  put_wordsJirst2  (in Jname  outjname) 

(let  ((phn_dictjile  (open  injname)) 

(outputjile  (open  outjname  :direction  :output)) 

(obj_read  nil)) 

(setf  obj_read  (read  phn_dictjile  nil  nil  nil)) 

(if  (not  (null  obj_read))  (file-position  phn_dictjile  :start)) 

(loop 

(setf  obj_read  (read  phn_dict  Jile  nil  nil  nil)) 

(cond  ((null  obj_read)  (return)) 

((>  (length  obj_read)  2) 

(print  (append  (last  (butlast  obj_read)) 

(last  obj_read) 

(butlast  (butlast  obj_read))) 
outputjile)))) 

(close  phn_dictjile) 

(close  outputjile))) 

,***ic**ie*it*iti(it*itititic1tit*itititieiticic*ieitititieie'kici(it'icieitit'kic**ic'kicit'kit-kitititit‘itit'k1t’kititieiciticicit 


;  BUILDJNDEX  -  Creates  a  list  of  phoneme/file-position  pairs, 
;  Each  pair  records  the  file  position  of  the  first  word  entry  that 
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;  starts  with  the  given  phoneme. 

(defun  build_index  (fname  out_fname) 

(let  ((lexicon_file  (open  fname)) 

(last_phn  nil) 

(word_read  nil) 

(file_pos  0) 

(phnjndex  nil) ) 

(setf  word_read  (read  lexiconjile  nil  nil  nil)) 

(if  (not  (null  word_read))  (file-position  lexicon_file  :start)) 

(loop 

(setf  file_pos  (file-position  lexiconjile)) 

(setf  word_read  (read  lexiconjile  nil  nil  nil)) 

(cond  ((null  word_read)  (return)) 

((not  (equal  last_phn  (second  word_read))) 

(setf  phn_index 

(append  phn_index 

(list  (list  (second  word_read)  file_pos)))) 

(setf  last_phn  (second  word_read))) 

(t  nil))) 

(close  lexiconjile) 

(writejile  phn_index  out Jname) )) 

.************************************w************w*********************** 

I 

;  BUILD_INDEX2  -  Creates  a  list  of  phoneme/file-position  pairs. 

;  Each  pair  records  the  file  position  of  the  first  word  entry  that 

;  has  phoneme  2  =  the  given  phoneme. 
,*****★★***★******★*****★★*★*★**★***★************★******★*************** 

(defun  buildjndex2  (fname  outjname) 

(let  ((lexiconjile  (open  fname)) 

(last_phn  nil) 

(word_read  nil) 

(file_pos  0) 

(phnjndex  nil)) 

(setf  word_read  (read  lexiconjile  nil  nil  nil)) 

(if  (not  (null  word_read))  (file-position  lexiconjile  :start)) 

(loop 

(setf  fiIe_pos  (file-position  lexiconjile)) 

(setf  word_read  (read  lexiconjile  nil  nil  nil)) 

(cond  ((null  word_read)  (return)) 

((not  (equal  last_phn  (third  word_read))) 

(setf  phnjndex 

(append  phnjndex 

(list  (list  (third  word_read)  file_pos)))) 

(setf  last_phn  (third  word_read))) 

(t  nil))) 
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(close  lexiconjile) 

(write_file  phnjndex  out_fnanne))) 

I 

,iritititifiticiciticititititicit*icicirificitititi{itiritiri(icicititit1tiei('k1f'kitititificic'kieie1titif'kitit1t'k'k1c'k'kic*iri('kic'kititit 

f 

(defun  sort_dict  (injname  out_fname) 

(let  ((tmp_dict  (open  in_fname)) 

(out_dict  (open  out_fname  idirection  :output)) 

(dictjist  nil)) 

(setf  dictjist  (read  tmp_dict  nil  nil  nil)) 

(sort  dictjist  #'word<) 

(dolist  (wrd  dictjist) 

(print  wrd  out_dict)) 

(close  tmp_dict) 

(close  out_dict))) 


,**ik***^**it******ik*****iir********ir*****ir**iif**it***ie***iHtik****itiir*irititit****irir 

t 

,*************'it*****iir***ilr**ilriir**itit****iititiltiirir**it*irititiir***itiieiriririt***itieit**it*itit* 

f 

(defun  word<  (wrd1  wrd2) 

(if  (or  (string<  (symbol-name  (first  wrd1)) 

(symbol-name  (first  wrd2))) 

(and  (equal  (first  wrdi) 

(first  wrd2)) 

(string<  (symbol-name  (second  wrd1)) 

(symbol-name  (second  wrd2)) ) ) ) 
t 

nil )) 


i 

9 


;  READ_WORD_LINE  -  Returns  the  next  non-comment  line  from  TIMIT  dictionary, 
(defun  read_wordJine  (file_handle) 

(let  ((line_read  nil)) 

(loop 

(setf  line_read  (read-line  file_handle  nil  nil  nil)) 

(cond  ((null  line_read)  (return)) 

((and  (not  (equal  (char  line_read  0)  #\;)) 

(not  (null  (read-from-string  line_read)))) 

(return  line_read))  ) ) )) 


.****************************iieit****itilr***it***it*it*it**it****ieititit*********ir** 

I 


;  TEXT_TO_LIST  -  Converts  a  TIMIT  dictionary  entry  from  text  to  a  LISP  list. 
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(defun  text_toJist  (textjine) 

(if  (not  (null  textjine)) 

(let  ((textjist  (fonnat  nil  "~S"  (list  textjine)))) 

(setf  (char  text Jist  1)  #\Space) 

(setf  (char  textjist  (-  (length  textjist)  2))  #\Space) 
(read-from-string  textjist) ) )) 

,*★********★★**★*★★★**★*★★★********★****★*******★★★★*★*★****★★****★★★★** 

9 

,★****'******★*****★********★**★★*★★*★*****★*****★★****★*★******★****★*** 


;  CLEAN_TEXT  -  Removes  extraneous  characters  from  TIMIT  dictionary  entries, 
(defun  cleanjext  (phn  Jext) 

(if  (not  (null  phnjext)) 

(let  ((cleaned Jext  phnjext) 

(extraneous_chars  '(#V  #\1  #\2  #\3))) 

(dotimes  (i  (length  phnjext)  cleaned  Jext) 

(if  (member  (char  phnjext  i)  extraneous_chars) 

(setf  (char  cleanedjext  i)  #\Space) ) 

(if  (equal  (char  phnjext  i)  #V) 

(setf  (char  cleanedjext  i)  #\~) ) ) 
cleanedjext) )) 

,*★*★★*★★★**★***★**★★*★★*★★★★★★★***★*★★★*★★★★★***********★******★★***★** 

,*****************  fcit*it*it*iciticitit’kie**itit'k'kiti(icie'k*itfc‘k****icic'kie'kit**ie*ieic*'kitifkie*’k* 


;  READ_WORD  -  Returns  next  TIMIT  dictionary  entry  as  a  LISP  list, 
(defun  read_word  (file_handle) 

(text JoJist  (cleanjext  (read_word Jine  file_handle)))) 

.  ♦**★***★*★*★★*****★********★★*★★**★**★***★*********★********★*★**★*★*** 

9 

,*******★***★**★****★****★****★***★******★****★*★*******★***********★*** 


;  BIN_SEARCH  -  Binary  search  used  by  INDEX_POS  to  determine  first 
;  dictionary  entry  with  given  starting  phoneme 
(defun  bin_search  (phoneme  start_n  end_n  index) 

(let  ((mid_n  (round  (/  (+  start_n  end_n)  2)))) 

(cond  ((>=  start_n  end_n) 

(if  (equal  (first  (nth  start_n  index))  phoneme) 

(second  (nth  start_n  index)) 
nil)) 

((string<  (string  phoneme) 

(string  (first  (nth  mid_n  index)))) 

(bin_search  phoneme  start_n  (1  -  mid_n)  index)) 

((string>  (string  phoneme) 

(string  (first  (nth  mid_n  index)))) 

(bin_search  phoneme  (1+  mid_n)  end_n  index)) 
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( t  (second  (nth  mid_n  index)))))) 


.ic1tifieitic*ic*1titie**ieitieicieiticicic*iticitieie*'kitrk'kitit'kititit'k*’kititititititititit*i(it1ticicirir'kitieiriticifie*itic 

I 


;  INDEX_POS  -  Uses  BIN_SEARCH  to  determine  first 
;  dictionary  entry  with  given  starling  phoneme 
(defun  index_pos  (phoneme  index) 

(bin_search  phoneme  0  (-  (length  index)  1)  index)) 

.*W*W*W4rir*W*W4r4r4r*W*WWW*W4rWW**4rW*WW****W*<«rWW*WW******W*W***W*****WWWW*WWW 

) 

I 

(defun  make_dicts  () 

(format  t  "~%Preparing  for  sort  #1 ...") 

(create_phn_dict  "timitdic.txt"  "lexicon.txt") 

(format  t  "~%Sorting  lexicon  #1...") 

(sort_dict  "lexicon.txt"  "tempdic.txt") 

(format  t  "~%Reordering  #1 ...") 

(put_words_first  "tempdic.txt"  "lexicon.txt") 

(format  t  "~%Building  index  #1 ...") 

(buildjndex  "lexicon.txt"  "phnindex.txt") 

(format  t  "~%Preparing  for  sort  #2...") 

(create_phn_dict2  "timitdic.txt"  "lexicon2.txt") 

(format  t  "~%Sorting  #2...") 

(sort_dict  "lexicon2.txt"  "tempdic.txt") 

(format  t  "~%Reordering  #2...") 

(put_words_first2  "tempdic.txt"  “lexicon2.txt") 

(format  t  "~%lndexing  #2...") 

(buildjndex2  "lexicon2.txt"  "phnindx2.txt") 

(format  t  "~%Done  building  lexicons") 
t) 
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E.  LEXICON.TXT 


(OBJECT~N  AA  B  JH  EH  K  T) 

(OBJECTS  AA  B  JH  EH  K  T  S) 

(OBSESSION  AA  B  S  EH  SH  IX  N) 

(OBSESSIONS  AA  B  S  EH  SH  IX  N  Z) 

(OBSOLESCENT  AA  B  S  AX  L  EH  S  EN  T) 

(OBSTACLES  AA  B  S  T  IH  K  EL  Z) 

(OBVIOUS  AA  B  V  lY  AX  S) 

(OBVIOUSLY  AA  B  V  lY  AX  S  L  lY) 

(ODD  AA  D) 

(ODDS  AA  D  Z) 

(ODDS-ONAADZAON) 

(AHAH  AA  HH  AA) 

//  BULK  OF  LEXICON  OMITTED  TO  SAVE  SPACE  // 

(USE~N  Y  UW  S) 

(USE~V  Y  UW  Z) 

(USING  YUWZ IX  NG) 

(USUAL  Y  UW  ZH  UW  EL) 

(USUALLY  Y  UW  ZH  UW  AX  L  lY) 

(UTILIZING  Y  UW  T  AX  L  AY  Z  IX  NG) 

(UTOPIANISM  Y  UW  T  OW  P  lY  EN  IH  Z  EM) 

(UTOPIANS  Y  UW  T  OW  P  lY  EN  Z) 

(YOU  Y  UW) 


F.  LEXICON2.TXT 


(BOB  B  AA  B) 

(BOBBY  B  AA  B  lY) 

(BOBCAT  BAABKAET) 

(COBBLER~S  K  AA  B  L  AXR  Z) 

(JOB  JH  AA  B) 

(ROB  R  AA  B) 

(ROBIN  R  AA  B  IX  N) 

(WOBBLE  W  AA  B  EL) 

//  BULK  OF  LEXICON  OMITTED  TO  SAVE  SPACE  // 
(EASY  lY  Z  lY) 

(EASYGOING  lY  Z  lY  G  OW  IX  NG) 

(ASTHMA  AE  Z  M  AX) 
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G.  PHNINDEX.TXT 


((AAO)  (AE1406) 
(AY  17254)  (B  18190) 
(ER  40653)  (EY  41015) 
(lY  69715)  (JH  70173) 
(OW  102942)  (OY  104478) 
(T  147921)  (TH  154674) 


(AH  5923)  (AO  8582) 
(CH  26124)  (D  27445) 
(F  41482)  (G  49147) 
(K  72175)  (L  86484) 
(P  104628)  (R  116939) 
(UW  156159)  (V  156176) 


(AW  10417) 
(DH  36700) 
(HH  53189) 
(M  91274) 
(S  126411) 
(W  158877) 


H.  PHNINDX2.TXT 


((AAO)  (AE8496)  (AH  19748)  (AO  26768)  (AW  32328) 
(AXR  41419)  (AY  43 122)  (B  48901)  (CH  50132)  (D  50245) 
(EH  52196)  (ER  63293)  (EY  67429)  (F  72390)  (G  73210) 
(BH  74503)  (1X90607)  (lY  91610)  (JH  97191)  (K  97602) 

(M  110634)  (N  113252)  (NG  123018)  (OW  123226)  (OY  127976) 
(R  132537)  (S  146744)  (SH  148084)  (T  148426)  (TH  153434) 
(UW  155688)  (V  158995)  (W  160867)  (Y  163488)  (Z  164536)) 


1.  PHNDURS.TXT 


( 

(AA(381  7735  1984)) 
(AE  (  660  6800  2144)) 
(AH  (  300  5442  1414)) 
(AO  (421  5760  1957)) 
(AW  (  1003  6640  2563)) 
(AX  (74  2778  766)) 
(AXR  (305  3752  1236)) 
(AY  ( 616  6790  2485)) 
(B(47  1229  279)) 

(CH  ( 320  3200  1382)) 
(D(81  1840  385)) 

(DH(  no  1839  563)) 
(EH  (343  4498  1448)) 
(EL  (440  3810  1440)) 
(EM  (656  3300  1281)) 
(EN(204  2943  1236)) 
(ENG  (694  1882  1274)) 
(ER(560  5862  1937)) 
(EY(668  6250  2017)) 
(F(213  4880  1640)) 
(G(124  1661  442)) 


(AX  11006) 
(EH  37226) 
(IH  59913) 
(N  99381) 
(SH  145371) 
(Y  162935)) 


(AX  33915) 
(DH  52101) 
(HH  74464) 
(L  102931) 
(P  128847) 
(UH  153811) 
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(HH(220  3440  1079)) 
(HX(480  32767  2714)) 
(IH(333  6386  1244)) 
(DC  (  186  3189  819)) 
aY(372  5540  1529)) 
(JH(238  3073  998)) 
(K(98  2716  829)) 
(L(203  3677  970)) 

(M  ( 130  3676  1032)) 

(N  (  152  3397  880)) 

(NG  ( 256  3844  1003)) 
(OW  ( 559  5520  2039)) 
(OY  ( 1002  5096  2707)) 
(P  ( 100  1930  707)) 
(R(36  2783  893)) 

(S  ( 374  5709  1798)) 
(SH  ( 385  4213  1895)) 
(T(74  2315  773)) 
(TH(201  3602  1404)) 
(UH(292  3509  1217)) 
(UW  ( 390  5034  1674)) 
(V  ( 192  2464  947)) 

(W  ( 152  3280  979)) 

(Y  (  183  2632  868)) 
(Z(410  4310  1365)) 
(ZH(355  4218  1362)) 

) 


APPENDIX  D.  LIST  OF  157  DIPHONES  IN  MEDIUM  AND  LARGE  DATA  SETS 


AAK 


AAL 


AAR 


AEK 


AEM 


AEN 


AHM 


AH  N 


AHS 


AOL 


AOR 


AWN 


AXB 


AXL 


AXM 


AXN 


AXP 


AXS 


AXV 


AXR  HX 


AXRZ 


BAY 


BEL 


BIH 


BR 


DHX 


DIH 


DIX 


DR 


DHAX 


DH  IX 


DH  lY 


EHK 

IX  N 

LIH 

EHL 

IX  NG 

LIX 

EHN 

IX  P 

LIY 

EHR 

IX  S 

MAH 

EHS 

IX  T 

MAY 

EHV 

IX  V 

MEY 

EYN 

IX  Z 

MIX 

EY  SH 

lY  HX 

MP 

EYT 

lY  IX 

ND 

FAO 

lYK 

NDH 

FAXR 

lYN 

N  HX 

FIH 

lY  NG 

NIH 

FR 

lYP 

NIX 

GR 

lYS 

N  lY 

HXD 

lYZ 

NS 

HX  DH 

JH  IX 

NT 

HX  HH 

KAA 

NZ 

HXK 

KAE 

PAXR 

HXW 

KAO 

PL 

IHG 

KEL 

PR 

IHK 

KIX 

RAA 

IH  L 

KL 

RAE 

IHM 

KR 

RAH 

IHN 

KS 

RAY 

IH  NG 

KT 

REH 

IHS 

K  W 

REY 

IH  SH 

LAA 

RIH 

IHZ 

LAE 

RIX 

IX  D 

LAX 

RIY 

IX  G 

LAY 

ROW 

IX  JH 

LD 

SAH 

IX  K 

LEH 

SAX 

IX  M 

LEY 

SHX 

SIX 


SIY 


SK 


SP 


ST 


SH  EN 


SH  IX 


TAE 


TAXR 


TEH 


THX 


TIH 


TlX 


TIY 


TR 


TS 


VAXR 


WAX 


WIH 


WIX 


WIY 


ZAX 


ZHX 


ZIX 


ZK 


APPENDIX  E.  TIMIT  VOCABULARY  LIST  FOR  WORD  TESTS 


A.  DESCRIPTION 

The  following  table  lists  the  340  words  in  the  TIMIT  lexicon  that  can  be  constructed  using 
the  157  diphones  of  the  Medium  and  Large  Data  Sets.  The  lexical  analyzer  scans  the  diphone 
stream  from  the  neural  network  and  identifies  the  phonemes  for  a  word  if  they  are  present. 
Because  each  phoneme  in  a  word  forms  part  of  two  diphones,  it  is  possible  to  detect  all  of  a 
word's  phonemes  even  if  some  diphones  are  missing  from  the  diphone  stream  (up  to  every  other 
diphone  can  be  missing).  Thus,  it  is  actually  possible  to  detect  far  more  than  the  340  words  listed 
here.  However,  the  additional  words  would  suffer  a  lower  probability  of  detection.  Therefore, 
this  vocabulary  list  is  limited  to  words  with  all  diphones  among  the  neural  network's  output 
classes,  except  as  described  in  the  next  paragraph. 

The  number  following  each  word  represents  the  number  of  that  word's  diphones  that  are 
not  among  the  157  diphones  of  the  Medium  and  Large  Data  Sets.  Currently,  VOCAB.ADA  will 
only  allow  a  word's  first  diphone  to  be  missing  from  the  diphone  list.  In  every  case,  a  word's  first 
diphone  is  considered  to  be  a  period  of  silence  (denoted  by  the  dummy  phoneme  HX)  combined 
with  the  word's  first  phoneme.  The  lexical  analyzer  must  be  able  to  detect  a  word  without  this 
diphone  because  in  continuous  speech  it  cannot  count  on  having  a  period  of  silence  before  every 
word.  Thus,  VOCAB.ADA  will  consider  a  word  to  be  in- vocabulary,  even  if  this  first  diphone  is 
not  in  the  current  list  of  recognizable  diphones.  The  diphone  listed  after  each  number  is  the 
name  of  any  diphones  that  are  not  among  the  157  diphones  of  the  Medium  and  Large  Data  Sets. 
Again,  this  list  will  currently  only  contain,  at  most,  the  word's  first  diphone. 


B.  VOCABULARY 


a,  1,H#/AX 

fishing,  1,H#/F 

remains,  1  ,H#/R 

ability,  1,H#/AX 

fist,  1,H#/F 

rent,  1,H#/R 

abyss,  1,H#/AX 

fists,  1,H#/F 

rented,  1,H#/R 

accident,  1  ,H#/AE 

fix,  1  ,H#/F 

rest,  1,H#/R 

act,  1,H#/AE 

fixed,  1,H#/F 

restricted,  1  ,H#/R 

actor,  1,H#/AE 

for,  1,H#/F 

rickety,  1,H#/R 

actors,  1,H#/AE 

forest,  1,H#/F 

rim,  1  ,H#/R 
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acts,  1,H#/AE 

four,  1,H#/F 

risks,  1,H#/R 

ain’t,  1,H#/EY 

frantically,  1  ,H#/F 

rock,  1,H#/R 

all,  1,H#/A0 

frequent,  1,H#/F 

rockets,  1  ,H#/R 

am,  1  ,H#/AE 

friend,  1,H#/F 

run,  1  ,H#/R 

an,  1,H#/AE 

from,  1  ,H#/F 

running,  1,H#/R 

and,  1,H#/AE 

front,  1  ,H#/F 

runs,  1  ,H#/R 

ant,  1,H#/AE 

frustration,  1,H#/F 

rust,  1,H#/R 

ants,  1  ,H#/AE 

grains,  1,H#/G 

scar,  1,H#/S 

any,  1  ,H#/EH 

grants,  1  ,H#/G 

scenes,  1  ,H#/S 

apply,  1,H#/AX 

greasing,  1,H#/G 

score,  1,H#/S 

approximated,  1,H#/AX 

greasy,  1,H#/G 

screen,  1  ,H#/S 

approximation,  1,H#/AX 

great,  1,H#/G 

sea,  1,H#/S 

are,  1,H#/AA 

greater,  1  ,H#/G 

secret,  1,H#/S 

ate,  1,H#/EY 

green,  1,H#/G 

see,  1,H#/S 

aunt,  1,H#/AE 

greenness,  1,H#/G 

seeing,  1  ,H#/S 

aw,  1,H#/AO 

grill,  1,H#/G 

seeking,  1,H#/S 

awe,  1  ,H#/AO 

grin,  1,H#/G 

seen,  1,H#/S 

axis,  1  ,H#/AE 

grinned,  1,H#/G 

selected,  1,H#/S 

be,  1,H#/B 

grow,  1  ,H#/G 

selecting,  1,H#/S 

beans,  1,H#/B 

grunt,  1  ,H#/G 

solicits,  1  ,H#/S 

been,  1,H#/B 

i,  1,H#/AY 

some,  1,H#/S 

beep,  1,H#/B 

ill,  1,H#/IH 

son,  1  ,H#/S 

being,  1,H#/B 

in,  1,H#/IH 

splinter,  1  ,H#/S 

big,  1,H#/B 

innocence,  1,H#/IH 

sprained,  1,H#/S 

bill,  1,H#/B 

insulate,  1,H#/IH 

spray,  1,H#/S 

biscuit,  1  ,H#/B 

insulator,  1,H#/IH 

spring,  1,H#/S 

brain,  1  ,H#/B 

integrated,  1  ,H#/IH 

squeaked,  1  ,H#/S 

brand,  1,H#/B 

integration,  1,H#/IH 

stacked,  1  ,H#/S 

breast,  1  ,H#/B 

Intelligence,  1  ,H#/IH 

stand,  1,H#/S 

breeze,  1  ,H#/B 

intelligent,  1  ,H#/IH 

steep,  1  ,H#/S 

brick,  1,H#/B 

Intent,  1,H#/IH 

stick,  1,H#/S 

bricks,  1,H#/B 

is,  1,H#/IH 

still,  1,H#/S 

brim,  1,H#/B 

knee,  1,H#/N 

straight,  1,H#/S 

bring,  1,H#/B 

knees,  1,H#/N 

strain,  1  ,H#/S 

build,  1,H#/B 

knick-,  1,H#/N 

strained,  1,H#/S 
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building,  1,H#/B 

lack,  1,H#/L 

stranded,  1  ,H#/S 

buy,  1,H#/B 

lamp,  1,H#/L 

stray,  1  ,H#/S 

by,  1,H#/B 

land,  1,H#/L 

stress,  1,H#/S 

call,  0 

late,  1,H#/L 

sum,  1,H#/S 

called,  0 

later,  1,H#/L 

sun,  1,H#/S 

calling,  0 

latest,  1,H#/L 

tackle,  1,H#/T 

camp,  0 

lay,  1,H#/L 

tan,  1  ,H#/T 

can,  0 

leap,  1,H#/L 

tax,  1  ,H#/T 

cant,  0 

least,  1,H#/L 

taxi,  1,H#/T 

canister,  0 

less,  1,H#/L 

tea,  1  ,H#/T 

canned,  0 

lest,  1,H#/L 

tell,  1,H#/T 

canteen,  0 

lie,  1,H#/L 

ten,  1  ,H#/T 

canter,  0 

lily,  1,H#/L 

tenant,  1,H#/T 

car,  0 

limit,  1  ,H#/L 

tenant's,  1,H#/T 

cease,  1  ,H#/S 

liquid,  1  ,H#/L 

tend,  1  ,H#/T 

chorused,  0 

list,  1,H#/L 

tended,  1,H#/T 

clamp,  0 

listed,  1,H#/L 

test,  1,H#/T 

clay,  0 

lists,  1,H#/L 

test-run,  1,H#/T 

clean,  0 

lock,  1,H#/L 

testicle,  1,H#/T 

cleaned,  0 

locked,  1  ,H#/L 

tests,  1,H#/T 

cleaning,  0 

main,  1,H#/M 

text,  1,H#/T 

cleans,  0 

maintenance,  1  ,H#/M 

texts,  1,H#/T 

clever,  0 

mate,  1  ,H#/M 

the,  0 

cling,  0 

mates,  1,H#/M 

these,  0 

collie,  0 

may,  1,H#/M 

tim,  1,H#/T 

cory,  0 

money,  1  ,H#/M 

tin,  1  ,H#/T 

crack,  0 

must,  1,H#/M 

track,  1,H#/T 

cracked,  0 

my,  1  ,H#/M 

tracks,  1,H#/T 

crest,  0 

o',  1,H#/OW 

train,  1  ,H#/T 

crisp,  0 

of,  1,H#/AX 

training,  1,H#/T 

criss-,  0 

oh,  1  ,H#/OW 

trait,  1,H#/T 

cry,  0 

or,  1,H#/AO 

traits,  1,H#/T 

dig,  0 

origin,  1  ,H#/AO 

transience,  1,H#/T 

dim,  0 

plan,  1,H#/P 

transit,  1,H#/T 

discipline,  0 

plane,  1  ,H#/P 

tray,  1  ,H#/T 
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disciplined,  0 

planes,  1,H#/P 

dish,  0 

planets,  1,H#/P 

dishes,  0 

planned,  1,H#/P 

disk,  0 

planning,  1,H#/P 

display,  0 

plans,  1,H#/P 

distance,  0 

plant,  1,H#/P 

distances,  0 

planted,  1,H#/P 

distracted,  0 

plate,  1,H#/P 

distress,  0 

plated,  1,H#/P 

districts,  0 

play,  1,H#/P 

drain,  0 

please,  1,H#/P 

dress,  0 

plenty,  1,H#/P 

dressing,  0 

practical,  1,H#/P 

dressy,  0 

practiced,  1  ,H#/P 

dry,  0 

practices,  1  ,H#/P 

ease,  1,H#/IY 

princes,  1,H#/P 

easily,  1,H#/IY 

print,  1,H#/P 

eight,  1,H#/EY 

prints,  1,H#/P  ^ 

electric,  1  ,H#/AX 

queen,  0  ' 

electrical,  1,H#/AX 

quick,  0  ' 

electricity,  1,H#/AX 

quickly,  0  ' 

end,  1,H#/EH 

quince,  0  ' 

ended,  1  ,H#/EH 

rack,  1,H#/R  ; 

enter,  1,H#/EH 

rain,  1  ,H#/R 

entity,  1,H#/EH 

rains,  1  ,H#/R 

entries,  1  ,H#/EH 

ran,  1,H#/R 

entry,  1  ,H#/EH 

rate,  1,H#/R 

ever,  1,H#/EH 

rated,  1,H#/R 

explain,  1,H#/IH 

rates,  1,H#/R 

explains,  1,H#/IH 

rating,  1,H#/R 

exquisite,  1,H#/EH 

ray,  1,H#/R 

extended,  1  ,H#/IH 

recall,  1,H#/R 

extending,  1,H#/IH 

recalled,  1,H#/R 

extent,  1,H#/IH 

rector,  1,H#/R 

eye,  1,H#/AY 

related,  1,H#/R 

tree,  1,H#/T 


trees,  1,H#/T 


trend,  1,H#/T 


trick,  1  ,H#/T 


trim,  1,H#/T 


trish,  1,H#/T 


try,  1,H#/T 


uneasily,  1,H#/AH 


unseen,  1,H#/AH 


until,  1,H#/AX 


us,  1,H#/AH 
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fall,  1,H#/F 

relaxed,  1,H#/R 

fallen,  1,H#/F 

release,  1,H#/R 

fill,  1,H#/F 

released,  1,H#/R 

filling,  1,H#/F 

relish,  1,H#/R 

finish,  1,H#/F 

rely,  1,H#/R 

finishing,  1  ,H#/F 

remain,  1,H#/R 

fiscal,  1,H#/F 

remained,  1,H#/R 

fish,  1,H#/F 

remaining,  1  ,H#/R 
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APPENDIX  F.  TRAINING  RECORD  FOR  NETWORK  1  IN  TABLE  4.3 


A.  DESCRIPTION 

The  training  record  in  this  appendix  was  produced  automatically  by  TRNBPX.M  while 
training  Network  1  in  Table  4.3.  The  record  shows  the  three  main  training  phases  discussed  in 
Chapter  4.  First,  the  network  was  trained  on  just  one  training  set  example  for  each  diphone 
(through  epoch  79).  Then,  the  network  was  trained  without  feedback-based  time  alignment  until 
errors  seemed  to  have  reached  a  minimum  (epochs  90  through  224,  with  the  error  mintmnm  at 
epoch  196).  Finally,  training  was  completed  with  feedback-based  time  alignment  (starting  at 
epoch  197).  Several  additional  training  restarts  are  evident  in  this  report.  Those  usually  resulted 
from  problems  with  the  host  (like  system  shutdowns  for  maintenance).  Also,  a  few  more  errors 
were  eliminated  by  lowering  the  MATLAB  momentum  and  error  ratio  terms  and  the  resuming 
training  at  epoch  268. 

The  weights  from  epoch  267  were  used  to  perform  the  word  tests  described  in  Chapter  IV. 
The  (slightly  better)  statistics  from  epoch  278  were  reported  in  Table  4.3. 

The  fields  in  each  output  line  of  the  report  are: 

1.  Epoch  number. 

2.  TRAIN  (indicating  the  data  set  that  the  next  three  fields  refer  to) 

3.  Training  set  Sum  Squared  Error  (SSE)  for  the  epoch 

4.  Training  set  number  of  diphones  correct  (out  of  15,700  in  this  case). 

5.  Training  set  number  of  diphones  not  in  top  three 

6.  TEST  (indicating  the  data  set  that  the  next  three  fields  refer  to) 

7.  Test  set  Sum  Squared  Error  (SSE)  for  the  epoch 

8.  Test  set  number  of  diphones  correct  (out  of  15,700  in  this  case) 

9.  Test  set  number  of  diphones  not  in  top  three 

10.  A  to  indicate  SSE  decreased  (or  at  worst  increased  no  more  than  allowed  by 
the  MATLAB  error  ratio  parameter)  or  a  "***"  to  indicate  that  the  training  set 
SSE  actually  increased  as  a  result  of  the  weight  changes  (which  will  cause 
TRNBPX.M  to  decrease  the  weights,  reduce  the  learning  rate,  a  zero  out  the 
momentum  term  for  an  epoch). 
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B.  TRAINING  RECORD 


NNET  TRAINING  RECORD  Ver.  11.0  (Epochs, SSEs,#Correct,#MoreThan3off) 

Starting  random  with  sqrt(ste)  patch  on  sxlOO  ref  set 
New  Weights,  First  Instance  Only,  NO  BATCHING, 
win_width  =  13,  max_offset  =  1,  seed  =  0 

logsig/logsig:  Hidden  =  256,  Num_Dips  =  157  ,  me  =  0.950000  ,  er  =  1.050000 

Start  Time  =  1 996/5/2  20:1 1 :38 

1,  TRAIN, 1995.848124, 1,154,  - 

2,  TRAIN,236.366505,9,135,  - 

3,  TRAIN, 157.278046,44,105,  - 

4,  TRAIN, 136.608430,60,86,  - 

5,  TRAIN, 119.669860,68,77,  - 

6,  TRAIN, 105.668917,84,64,  - 

7,  TRAIN, 94.05081 0,94,57,  - 

8,  TRAIN,82.302249,101,50,  - 

9,  TRAIN,74.967135,108,46,  - 

10,  TRAIN,67.197963,1 14,40,  - 

11,  TRAIN,58.882383,121,34,  - 

12,  TRAIN,52.839936,125,31,  - 

13,  TRAIN,47.260793,127,30,  - 

14,  TRAIN,42.735514,127,27,  - 

15,  TRAIN,38. 549172,131, 24, - 

16,  TRAIN,35.522931, 134,22,  - 

17,  TRAIN,33.105917,137,20,  - 

18,  TRAIN,30.334545,1 37,20,  - 

19,  TRAIN,28.447628,137,19,  - 

20,  TRAIN,26.658648,1 38,18, - 

21,  TRAIN,25.123675,142,15,  - 

22,  TRAIN,23.417798,143,14,  - 

23,  TRAIN,20.889654,144,12,  - 

24,  TRAIN,18.952710,145,12,  - 

25,  TRAIN,16.897846,145,11,  - 

26,  TRAIN,16.273867,145,11,  - 

27,  TRAIN,15.786864,146,11,  - 

28,  TRAIN,15.446025,146,11,  - 

29,  TRAIN,15.159968,146,11,  - 

30,  TRAIN,14.828331, 147,10,  - 

31,  TRAIN, 14.415502, 147,10,  - 

32,  TRAIN, 13.573549,147,10,  - 

33,  TRAIN, 13.233618,148,8,  - 

34,  TRAIN, 12.648216,150,7,  - 

35,  TRAIN, 10.177223,151, 6,  - 

36,  TRAIN,9.447141, 151,5,  - 

37,  TRAIN,8.1 80229,1 53,4,  - 

38,  TRAIN,8.076680,155,2,  - 

39,  TRAIN,6.640486,155,2,  - 

40,  TRAIN,5.088253,155,2,  - 

41,  TRAIN,4.035936,155,2,  - 

42,  TRAIN,3.733584,155,2,  - 

43,  TRAIN,3.540276,155,2,  - 

44,  TRAIN,3.42051 5,1 55,2,  - 

45,  TRAIN,3.315062,155,2,  - 
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46,  TRAIN, 3.227354, 155,2,  - 

47,  TRAIN, 3.149244,1 55,2,  - 

48,  TRAIN, 3.080232, 155,2,  - 

49,  TRAIN,3.013316, 155,2,  - 

50,  TRAIN,2.953998,155,2,  - 

51,  TRAIN,2.900981, 155,2,  - 

52,  TRAIN,2.850587,155,2,  - 

53,  TRAIN,2.802942,155,2,  - 

54,  TRAIN,2.7551 67,1 55,2,  - 

55,  TRAIN, 2.705984,1 56,1,  - 

56,  TRAIN, 2.606806,1 56,1,  • 

57,  TRAIN, 1.719107,156,1,  - 

58,  TRAIN, 1.6751 80,1 56,1,  - 

59,  TRAIN, 1.617412,156,1,  - 

60,  TRAIN, 1.579698,1 56,1,  - 

61,  TRAIN,1. 548818,156,1,  - 

62,  TRAIN,1. 519882,156,1,  - 

63,  TRAIN,1. 493706,1 56,1,  - 

64,  TRAIN, 1.469554, 156,1,  - 

65,  TRAIN,1. 44721 3,1 56,1,  - 

66,  TRAIN, 1.424741, 156,1,  - 

67,  TRAIN,1 .403695,1 56,1,  - 

68,  TRAIN,1. 384237,1 56,1,  - 

69,  TRAIN,1. 366024,1 56,1,  - 

70,  TRAIN,1. 348750,1 56,1,  - 

71,  TRAIN,1. 331 886,1 56,1,  - 

72,  TRAIN,1. 31 5760,1 56,1,  - 

73,  TRAIN, 1.300565,1 56,1,  - 

74,  TRAIN,1. 285063,1 56,1,  - 

75,  TRAIN,1. 268038,1 56,1,  - 

76,  TRAIN,1. 223247,1 57,0,  - 

77,  TRAIN, 21. 170338, 148, 3,  *** 

78,  TRAIN,1 18.198458,96,38,  *** 

79,  TRAIN,4.269325,156,0,  *** 


NNET  TRAINING  RECORD  -  Train  Ver.  11.0  (Epochs, SSEs,#Correct,#MoreThan3off) 

Resuming  with  patched  STE  ->  sqrt(STE) 

New  Weights,  First  Instance  Only,  NO  BATCHING, 

logsig/logsig:  Hidden  =  256,  Num_Dips  =  157  ,  me  =  0.950000  ,  er  =  1.050000 
Start  Time  =  1996/5/3  18:35:34 

80,  TRAIN,14780.718082,2364,11806,  TEST,41 82.651 892,758,3222,  - 

81,  TRAIN,1 3678.547017,4028, 9808,  TEST,4008.506783,1 040,2871,  - 

82,  TRAIN, 12697.597643,5223,8533,  TEST,3885.237038,1216,2648,  - 

83,  TRAIN,1 1948.871 163,6098,7713,  TEST,3801. 350937,1 31 8,2457,  - 

84,  TRAIN, 11276.061 962,6785,7032,  TEST,3733.762359,1 447,2349, - 

85,  TRAIN,1 0737.701 500,731 9,6567,  TEST,3650.957049,1 528,2248,  - 

86,  TRAIN,1 0222.402759,7869,6063,  TEST,3667.035001 ,1541 ,2208,  - 

87,  TRAIN,9788.99521 6,8307,5748,  TEST,3594.897039, 1626,21 12,  - 

88,  TRAIN,9404.01 2764,8646,5482,  TEST,3581.229172,1641,2044,  - 

89,  TRAIN,9048.987435,8953,5217,  TEST,3546.61 8377,1 665,1 999,  - 

90,  TRAIN,871 3.546272,9260,5020,  TEST,3537.740306,1684,1962,  - 
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NNET  TRAINING  RECORD  -  Train  Ver.  11.0  (Epochs, SSEs,#Correct.#MoreThan3off) 

Resuming  with  patched  STE  ->  sqrt(STE) 

New  Weights,  First  Instance  Only,  NO  BATCHING, 

logsig/logsig:  Hidden  =  256,  Num_Dips  =  157  ,  me  =  0.950000  ,  er  =  1.050000 
Start  Time  =  1996/5/6  17:59:20 

91,  TRAIN,7975.354025,9881,4671,  TEST,3464.289652,1 770, 1881,  - 

92,  TRAIN,7728.632578,10063,4513,  TEST,3455.9371 85, 1786,1 853,  - 

93,  TRAIN,7537.637486, 10269,4371,  TEST,3469.391 083, 1804,1 850,  - 

94,  TRAIN,7375.36631 5,10403,4227,  TEST,3446.215955,1818,1834,  - 

95,  TRAIN,7195.01 1888,10606,41 11,  TEST,3444.325959,1831 ,1776,  - 

96,  TRAIN,7053.448063,10746,4012,  TEST,3427.014973,1856,1777,  - 

97,  TRAIN,6881. 492430,10881, 3903,  TEST,3431.242658,1856,1787,  - 

98,  TRAIN,6731. 500129,1 1033,3818,  TEST,3412.183155,1899,1752,  - 

99,  TRAIN,6572.206341, 11 169,3683,  TEST,341 7.696492, 1874,1 757,  - 

100,  TRAIN,6428.803687,1 1251,3611,  TEST,3406.952575,1 891 ,1727,  - 

101,  TRAIN,6280.780659,1 1386,3511,  TEST,3399.402109,1875,1697,  - 

102,  TRAIN,61 48.791768,1 1505,3395,  TEST,3402.027869,1883,1714,  - 

103,  TRAIN,5994.132957,11641,3298,  TEST,3411.016047,1894,1653,  - 

104,  TRAIN,5899.230026,1 1736,3219,  TEST,3404.408400,191 9,1677,  - 

105,  TRAIN,5726.669752,1 1864,3151,  TEST,3414.412410,1900,1685,  - 

106,  TRAIN,5612.036223,1 1941,3072,  TEST,3398.611651, 1934,1652,  - 

107,  TRAIN,5489.838603,1 2028,3007,  TEST,3405.8821 05,1 91 8,1661,  - 

108,  TRAIN,5349.083391, 12147,2934,  TEST,3398.237526,1 922,1645,  - 

109,  TRAIN,5241.211347,12215,2863,  TEST,3393.825371, 1915,1668,  - 

110,  TRAIN,5138.1 18627,12307,2810,  TEST,3405.201 125,1898,1650,  - 

111,  TRAIN,5038.979382,12366,2745,  TEST,3389.981944,1922,1638,  - 

112,  TRAIN,491 8.462776,12443,271 5,  TEST,3404.274095,1913,1603,  - 

113,  TRAIN,4824.130837,12497,2675,  TEST,3415.498804,1959,1636,  - 

114,  TRAIN,4768.574050, 12536,2642,  TEST,3403.47391 1 ,1935,1632,  - 

115,  TRAIN,4637.086515,12629,2573,  TEST,3441. 186401, 1934,1649,  - 

116,  TRAIN,4589.708875,12682,2552,  TEST,3422.591 756,1 921, 1633,  - 

117,  TRAIN,4493.310687,12751,2539,  TEST,3409.049080,1938,1666,  - 

118,  TRAIN,4476.71 601 7,12763,2487,  TEST,3444.327261 ,1923,1636,  - 

119,  TRAIN,4429.0081 16,12785,2487,  TEST,3404.492692,1 928,1 628,  - 

120,  TRAIN,4355.164895,12824,2458,  TEST,3435.040716, 1938,1595,  - 

121,  TRAIN,4282.983005,12881,2441,  TEST,3447.21 1630,1944,1640,  - 

122,  TRAIN,4308.855568,12879,2429,  TEST,3478.964167,1922,1661 ,  - 

123,  TRAIN,4099.21 3604,12955,2388,  TEST,3449.377926,1949,1634,  - 

124,  TRAIN,4021.061916,12996,2377,  TEST,3452.047168,1921 ,1624,  - 

125,  TRAIN,3952.51 0356,1 3044,2355,  TEST,3432.037867,1937,1642,  - 

126,  TRAIN,3959.01 1646,13027,2351,  TEST,3461.000658,1943,1627,  - 

127,  TRAIN,391 1.466353,13078,2311,  TEST,3482.624967,1899,1642,  - 

128,  TRAIN,3925.6961 54,1 3086,2302,  TEST,3499.253097,1 897,1 653,  - 

129,  TRAIN,3755.1 61 957,1 31 36,2294,  TEST,3447.783052,1 929,1 588,  - 

130,  TRAIN,3642.815471, 13170,2276,  TEST,3493.26221 1,1907,1645,  - 

131,  TRAIN,3759.604539,13140,2237,  TEST,3504.41 1546,1926,1630,  - 

132,  TRAIN,3647.236436,1 321 9,2220,  TEST,3466.46021 7,1 949,1 595,  - 

133,  TRAIN,3631.891870,13218,2178,  TEST,3482.082856,1938,1622,  - 

134,  TRAIN,3693.01 7063,1 3238,21 44,  TEST,3531. 928843,191 1,1607,  - 

135,  TRAIN,3619.030788,13283,2142,  TEST,3549.534705,1905,1630,  - 

136,  TRAIN,3633.557710,13271,2113,  TEST,3509.971 139,1928,1626,  - 

137,  TRAIN,3612.694414,13292,2118,  TEST,3569.236360,1883,1632,  - 

138,  TRAIN,3746.247789,1 3263,2093,  TEST,3576.428130,1909,1640,  - 
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I 


1 39,  TRAIN, 3671 .266506,1 3277,21 01 

1 40,  TRAIN,3771 .082261 ,1 3241 ,2081 

141 ,  TRAIN,3661 .457360,13302,2061 

142,  TRAIN,3658.837530,1 3294,2043 

143,  TRAIN,3905.667369,1 3222,2040 

144,  TRAIN,2981 .309782,1 3503,2004 

145,  TRAIN,2599.014086,1 3582,1993 

146,  TRAIN,2472.661 1 90,1 361 1 ,1958 

1 47,  TRAIN, 2401 .042949, 1 3644,1 928 

1 48,  TRAIN,2355.773346,1 3661 ,1934 

1 49,  TRAIN, 231 0.97741 5,1 3690,1 905 

150,  TRAIN, 2260.435881, 1371 1,1 887 

151,  TRAIN, 2232.840560, 1 371 9, 1 881 

1 52,  TRAIN, 2265.422478,1 3743,1 846 

1 53,  TRAIN, 231 9.559276,1 3756,1 841 

1 54,  TRAIN, 241 9.891 1 1 0,1 3766,1 81 8 

1 55,  TRAIN, 271 9.455478,1 371 7,1 808 

1 56,  TRAIN, 2221 .428023,1 3790,1 784 

1 57,  TRAIN, 2066.500580,1 3838,1 757 

1 58,  TRAIN,2028.797252,1 3861 ,1755 

159,  TRAIN,1999.473567,13879,1731 

160,  TRAIN, 1973.425495,13899,1720 

161,  TRAIN.1 946.236275,1 3920,1 689 

1 62,  TRAIN,  1 927.740203,1 3943,1 677 

1 63,  TRAIN,  1 909.084655,1 3947,1 670 

164,  TRAIN, 1896.128762,13957,1653 

165,  TRAIN,1 884.979569,1 3984,1 643 

1 66,  TRAIN, 1 891 .31371 7,1 3990,1 636 

1 67,  TRAIN, 1 901 .362461 ,1 4002,1 626 

168,  TRAIN,2027.588978,13997,1611 

169,  TRAIN, 1842.233871, 14016,1611 

1 70,  TRAIN,  1 794.783875,1 4039, 1 597 

171,  TRAIN,1779.636690,14047,1588 

172,  TRAIN,1764.067088,14059,1574 

173,  TRAIN,1748.054207,1 4074,1573 

174,  TRAIN,1735.403566,1 4087,1561 

175,  TRAIN, 1721. 130731, 14093,1557 

1 76,  TRAIN,  1 709.732409,1 41 02, 1 537 

177,  TRAIN,1714.751268,1 41 12,1539 

178,  TRAIN,1727.191214,14120,1534 

179,  TRAIN, 1718.408081, 14125, 1521 

1 80,  TRAIN, 1 763.553048,1 41 29,1 520 

181 ,  TRAIN,4431 .622547,1301 1 ,1775, 

1 82,  TRAIN, 1 724.1 28388,1 41 32,1 51 7, 

183,  TRAIN,1645.903958,14149,1510 

184,  TRAIN,1627.4081 95,14153,1503 

185,  TRAIN,1616.956389,14160,1505 

1 86,  TRAIN ,1610.1 09825, 1 41 67, 1 500 

187,  TRAIN,1605.054405,14174,1493 

188,  TRAIN,1593.889004,14182,1489 

189,  TRAIN, 1583.187422,14185,1487, 

1 90,  TRAIN,  1 576.221 393,1 41 89,1 488 

1 91 ,  TRAIN, 1 575.436975,1 41 88,1 479 

1 92,  TRAIN,  1 565.743332,1 41 94,1 476, 

1 93,  TRAIN, 1 563.520329,1 41 96,1 467 

194,  TRAIN,6501 .681305,1 1382,2523, 


TEST,3545.3691 32,1 923,1 623,  - 
TEST,3620.893958,1 864,1 660,  - 
TEST,3605.696769,1913,1642,  - 
TEST,3601 .342024,1 896,1 649,  - 
TEST,3598.161397, 1863,1644,  *** 
TEST,3480.571218,1958,1579,  - 
TEST,3464.565822,1 981, 1573,  - 
TEST,3465.696400,1 981, 1529,  - 
TEST,3465.150248,2002,1533,  - 
TEST,3468.859565,1 977, 1524,  - 
TEST,3455.253232,1 995,1546,  - 
TEST,3469.646080,1992,1514,  - 
TEST,3457.880977,1 983,1 528,  - 
'TEST,3503.785742,2002,1544,  - 
TEST,351 6.270730,1 955,1 576,  - 
TEST,3545.553271, 1951, 1578,  - 
TEST,3632.305073,1875,1614,  *** 
TEST,3465.320664, 1982, 1524,  - 
TEST,3454.490694,1994,1527,  - 
TEST,3474.298621, 1984,1514,  - 
TEST,3458.909890,1 975,1 508,  - 
TEST,3461 .074925,1 983,1 495,  - 
TEST,3455.748942,2000,1508,  - 
TEST,3458.979386,2008,1492,  - 
TEST,3466.627587,1 991, 1499,  - 
TEST,3475.295782,1 998,1500,  - 
TEST,3468.432737,2008,1516,  - 
TEST,3487.016498,2013,1527,  - 
TEST, 3493.379447,1 987,1515,  - 
TEST,3576.740083,1 937,1 592, 
TEST,3471 .469868,2000,1 475,  - 
TEST, 3459.058367,2015,1484,  - 
TEST,3462.125653,2006,1480,  - 
TEST,3468.776442,2012,1506,  - 
TEST,3466.698850,2005,1501,  - 
TEST,3472.019076,2021,1514,  - 
TEST,3471 .06341 4,2009,1 511,  - 
TEST,3468.940027,2014,1486,  - 
TEST,3474.1 16880,2000,1489,  - 
TEST,3467.444379,1993,1516,  - 
TEST,3505.28051 1,1 989, 1546,  - 
TEST,3542.559993,1 984,1 531,  - 
TEST,3900.772867,1 638,1 900,  *** 
TEST,3494.32401 1,1 998,1 495,  - 
TEST,3480.092776,2014,1499,  - 
TEST, 3471. 566624,2027,1 500,  - 
TEST,3470.751533,2013,1505,  - 
TEST,3473.814005,2015,1495,  - 
TEST,3466.0901 11,2012,1494,  - 
TEST, 3468.167975, 2002,1496,  - 
TEST,3471. 082698, 201 1,1499,  - 
TEST,3481 .495820,2012,1 485,  - 
TEST,3472.419597,2012,1484,  - 
TEST,3464.885398,2022,1488,  - 
TEST,3485.699446,2000,1502,  - 
TEST,3992.71 1208,1 594,2006,  *** 


I 

I 
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1 95,  TRAIN,  1 567.098827,1 41 99,1 468, 

1 96,  TRAIN,  1 548.229053,1 4206,1 465, 

1 97,  TRAIN, 1 544.322378,1 4207,1 461 , 

198,  TRAIN,1 539.963259,14214,1 459, 

1 99,  TRAIN, 1 536.591 375,1 421 9,1 454, 

200,  TRAIN,1532.129389,14222,1449, 

201,  TRAIN,1530.154930,14222,1446, 

202,  TRAIN, 1 531 .01 0608,1 4232,1 440, 

203,  TRAIN, 1 553.950750,1 4231 ,1436, 

204,  TRAIN,5621 .547718,12048,2156, 

205,  TRAIN,1532.471058,14239,1438, 

206,  TRAIN,  1 509.441 995,1 4244,1 434, 

207,  TRAIN,1503.407466,14246,1427, 

208,  TRAIN,1497.143728,14246,1424, 

209,  TRAIN, 1 493.893521 ,14248,1 420, 

21 0,  TRAIN,1 494.085322,1 4252,1 423, 

21 1 ,  TRAIN,1486.634056,14252,1 417, 

212,  TRAIN, 1485.370179,14257,1412, 

213,  TRAIN,1987.182866,14097,1426, 

214,  TRAIN,1481. 538272, 14261, 1413, 

215,  TRAIN,1475.516999,14267,1408, 

21 6,  TRAIN, 1 473.540406,1 4268,1 405, 

217,  TRAIN, 1468.949037,14274,1402, 

218,  TRAIN,  1 466.651 845,1 4275, 1 400, 

219,  TRAIN,1465.825320,14276,1397, 

220,  TRAIN, 1460.092556, 14281, 1398, 

221,  TRAIN,1456.818287,14280,1396, 

222,  TRAIN,1454.127846,14281,1393, 

223,  TRAIN,1451. 080625,14285,1 390, 

224,  TRAIN,  1 448.633083,1 4285,1 391 , 


TEST,3462.907920,2016,1472,  - 
TEST,3465.566050,2008,1472,  - 
TEST, 3468.253940,2007,1 485,  - 
TEST, 3475.844092, 2006,1 488,  - 
TEST.3466.133498,2039,1480,  - 
TEST,3467.41 8349,201 9,1 475,  - 
TEST, 3464.48741 1,2019,1502,  - 
TEST,3470.526331, 201 8,1498,  - 
TEST,3491. 823689,201 7,1 497,  - 
TEST,4009.930183,1587,1994,  *** 
TEST, 3478.284700,2023,1510,  - 
TEST,3468.1 26097,201 4,1 511,  - 
TEST,3465.847702,201 1,1504,  - 
TEST,3469.082578,201 0,1496,  - 
TEST,3471. 948863,201 3,1 489,  - 
TEST,3463.616969,2013,1511,  - 
TEST,3466.970954,2010,1497,  - 
TEST,3470.612800,1997,1486,  - 
TEST,3796.316944,1769,1823,  *** 
TEST,3469.1 87955,2009,1 497,  - 
TEST,3468.394800,2015,1504,  - 
TEST,3469.526205,201 1,1496,  - 
TEST,3475.300060,2005,1488,  - 
TEST,3472.505020,2005,1488,  - 
TEST,3480.681690,1990,1473,  - 
TEST,3464.965838,2009,1487,  - 
TEST,3473.743757,1 999,1 486,  - 
TEST,3476.27631 2,201 3,1 479,  - 
TEST,3477.229580,2008,1492,  - 
TEST,3474.922538,2005,1484,  - 


NNET  TRAINING  RECORD  Ver.  11.0  (LI)  (Epochs,SSEs,#Correct,#MoreThan3off) 


Starting  +/-  2  non-fixed  TA  on  LI  SX100  sep  tst  set 

Same  Weights,  Entire  Fiie  Set,  NO  BATCHING,  Time-Aligning,  Separate  Test  Set, 
win_width  =  13,  max_offset  =  2,  seed  =  0 

iogsig/logsig:  Hidden  =  256,  Num_Dips  =  157  ,  me  =  0.950000  ,  er  =  1.050000 


Start  Time  =  1996/5/19  12:44:48 

1 97,  TRAIN, 1 51 3.541 055,1 4604,829, 

198,  TRAIN,1439.394587,14621,820, 

199,  TRAIN,1386.347128,14643,810, 

200,  TRAIN,1343.21 1479,14661,813, 

201,  TRAIN,1333.41 1093,14688,787, 

202,  TRAIN, 1290.893539,14702, 778, 

203,  TRAIN,1256.046615,14720,764, 

204,  TRAIN,1225.487427,14759,744, 

205,  TRAIN,1 195.314909,14790,724, 

206,  TRAIN, 1 1 82.1 40609,1 481 7,702, 

207,  TRAIN.1 142.391678,14818,696, 

208,  TRAIN,1 113.966376,14832,682, 

209,  TRAIN, 1089.975837,14844, 680, 

210,  TRAIN.1 067.478307, 1 4858,665, 

21 1 ,  TRAIN, 1072.047388,14875,663, 

212,  TRAIN, 1041 .320592,14891, 630, 

213,  TRAIN,1019.214789,14912,609, 


TEST,251 8.772806,3083,603,  - 
TEST,251 4.301 597,3097,596,  - 
TEST,251 0.776058,3086,600,  - 
TEST,2503.034539,3097,593,  - 
TEST,2502.422200,3094,603,  - 
TEST,2489.706907, 3099,600,  - 
TEST,2494.426775,3101,599,  - 
TEST,2494.226942,31 00,596,  - 
TEST,2488.542763,3086,585,  - 
TEST,2489.232357,3090,586,  - 
TEST, 2487.928071 ,3081, 597,  - 
TEST,2481. 320034, 3094,589,  - 
TEST,2479.224909,31 02,583,  - 
TEST,2476.41 4089,3092,566,  - 
TEST,2474.922586, 3074,561,  - 
TEST,2473.9231 65,3085,568,  - 
TEST, 2476.397882,3089,578,  - 
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214,  TRAIN, 997.339690, 14947, 601,  TEST,2473.896079,31 12,580,  - 

215,  TRAIN, 971. 53771 4,1 4958, 588,  TEST,2470.376026, 3086,590,  - 

216,  TRAIN,968.075922,1 4987,560,  TEST,2487.1 91 149,3081,578,  - 


NNET  TRAINING  RECORD  Ver.  11.0  (LI  Resume)  (Epochs, SSEs,#Correct,#MoreThan3off) 
Resuming  with  same  wts  &  momentum 

Same  Weights,  Entire  File  Set,  NO  BATCHING,  Time-Aligning,  Separate  Test  Set, 
win_width  =  13,  max_offset  =  2,  seed  =  0 

logsig/logsig:  Hidden  =  256,  Num_Dips  =  157  ,  me  =  0.950000  ,  er  =  1.050000 


Start  Time  =  1996/5/23  10:5:2 

217,  TRAIN,945.143514,14995,553, 

218,  TRAIN,912.372877,15007,544, 

219,  TRAIN, 885.840668,15024,546, 

220,  TRAIN,866.105779,15035,547, 

221,  TRAIN, 857.757543,1 5049,535, 

222,  TRAIN, 831. 890409,1 5062,529, 

223,  TRAIN, 809.709769,1 5070,520, 

224,  TRAIN,793.529201, 15077,514, 

225,  TRAIN, 775.868605,15089,503, 

226,  TRAIN,782. 257357,15100, 506, 

227,  TRAIN,752.943357.151 11,495, 

228,  TRAIN,735.999096,15122,491, 

229,  TRAIN,722.462653,1 51 27,480, 

230,  TRAIN, 709.343887,15140,480, 

231,  TRAIN, 719.088278,15144,478, 

232,  TRAIN,695.355814,15151,478, 

233,  TRAIN,677.722562,15159,470, 

234,  TRAIN,661.994739,15167,465, 

235,  TRAIN,653.034637,1 51 72,456, 

236,  TRAIN, 666.8651 35,15182,448, 

237,  TRAIN,642.709643,15194,430, 

238,  TRAIN,626. 330215,15205, 419, 

239,  TRAIN,608.809529,15226,397, 

240,  TRAIN, 598. 502574,15245, 393, 

241,  TRAIN, 631. 495208,15264, 357, 

242,  TRAIN,592.654847,1 5263,361, 

243,  TRAIN, 554.659526,1 5273,348, 

244,  TRAIN, 537.495798,1 5283,341, 

245,  TRAIN,524.297491 ,1 5291 ,344, 

246,  TRAIN,538.024469,15299,342, 

247,  TRAIN, 514. 665300,15307, 331, 

248,  TRAIN,496.897055,15313,320, 

249,  TRAIN, 488.029933,15316,324, 

250,  TRAIN,476.621329,15325,314, 

251,  TRAIN,495.044536,15328,315, 

252,  TRAIN,481. 71 8293,15333,310, 

253,  TRAIN, 459.882916,15338, 310, 

254,  TRAIN, 450.579953,15345, 302, 

255,  TRAIN, 441. 116910,15348,300, 

256,  TRAIN,450.96671 1,15354,300, 

257,  TRAIN, 446.203603,15357,293, 

258,  TRAIN, 676.023889,15321,296, 

259,  TRAIN, 431 .930594,15366,289, 

260,  TRAIN, 408. 030417,15371, 293, 


TEST,2480.250340,3084,597, 
TEST,2475.951 859,3082,570, 
TEST,2474. 1 751 37,3069,595, 
TEST,2473.65091 3,3076,584, 
TEST.2479.908912,3065,579, 
TEST,2469.921061 ,3082,581 , 
TEST,2470.608726,3074,587, 
TEST, 2464.441 31 0,3076,574, 
TEST,2468.21 5410,3065,568, 
TEST,2474.48371 6,3075,577, 
TEST,2474.985343,3072,570, 
TEST,2471 .898707,3085,563, 
TEST, 2475.751 305,3088,557, 
TEST,2468. 1 32626,3086,565, 
TEST,2475.64241 0,3044,563, 
TEST,2477.686291 ,3089,569, 
TEST,2470. 1 65935,3066,576, 
TEST,2469.91 8242,3083,557, 
TEST,2479.341 104,3089,565, 
TEST,2486.021 364,3057,580, 
TEST,2474.2261 24,3070,568, 
TEST,2463.81 5564,3078,559, 
TEST,2469.1 51 962,3067,558, 
TEST,2471 .260917,3081 ,547, 
TEST,251 1 .61 5090,3000,61 4, 
TEST,2477.923453,3078,552, 
TEST,2477.84881 2,3065,536, 
TEST,2466.3661 35,3084,534, 
TEST,2456.1 8471 9,3089,540, 
TEST, 2474.91 4041 ,3064,550, 
TEST,2460.36001 9,3091 ,541 , 
TEST,2462.60371 9,31 05,530, 
TEST,2464.524234, 3095,537, 
TEST,2449.183599,3103,526, 
TEST,2468.722743,3064,530, 
TEST,2490.397923,3044,547, 
TEST,2474.602249, 3075,531 , 
TEST,2453.66451 3,31 1 1 ,544, 
TEST,2461 .39761 6,31 08,525, 
TEST,2454.388121 ,3089,539, 
TEST,2482.220462, 3063,555, 
TEST,271 7.71 6462,2730,835, 
TEST,2472.641 71 5,3078,540, 
TEST,2453.043495,31 20,527, 
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261,  TRAIN, 409.799589, 15376, 286, 

262,  TRAIN,393.362240,15376,281, 

263,  TRAIN,388.592518,15379,278, 

264,  TRAIN,384.084181, 15383,275, 

265,  TRAIN,380.538080, 15388,268, 

266,  TRAIN,396.41 8548,1 5392,271, 

267,  TRAIN,380.869589,15397,266, 

268,  TRAIN, 370.767868,15402,256, 

269,  TRAIN,374.873139, 15402,254, 

270,  TRAIN,377.943383,15410,256, 

271,  TRAIN, 592.656381, 15387,254, 

272,  TRAIN,372.435764,1 541 5,260, 

273,  TRAIN,344.742953,15418,252, 

274,  TRAIN,339.229930, 15421, 252, 

275,  TRAIN, 331. 461 420,1 5425,252, 

276,  TRAIN,334.551 198,1 5427,249, 

277,  TRAIN,325.287235,15431,245, 

278,  TRAIN,318.943251, 15430,248, 

279,  TRAIN,315.133087,15431,244, 

280,  TRAIN,31 3.279966,1 5433,245, 

281,  TRAIN,31 4.683572,1 5436,241, 

282,  TRAIN,313.312689, 15438,239, 

283,  TRAIN,305.167799, 15441,238, 

284,  TRAIN,303.674744,1 5441, 236, 

285,  TRAIN,302.557163,15444,239, 

286,  TRAIN,1669.232315,14800,412 

287,  TRAIN,306.806272, 15444,234, 

288,  TRAIN,295.763487,15444,231, 

289,  TRAIN,293.188221, 15447,233, 

290,  TRAIN,292.283630,15448,232, 


TEST,2454.607747,31 23,523,  - 
TEST,2446.5881 35,3136,528,  - 
TEST,2440.91 0335,31 33,527,  - 
TEST,2440.398535,31 42,523,  - 
TEST,2447.538966,31 33,510,  - 
TEST,2467.5031 74,31 00,552,  - 
TEST,2450.123434,3104,540,  - 
TEST,2446.735704,31 14,530,  - 
TEST,2453.847634,3095,544,  - 
TEST,2477.396752,3074,546,  - 
TEST,2794.888396,2707,811,  *** 
TEST,2480.007066,3076,532,  - 
TEST,2444.746685,31 03,527,  - 
TEST,2442.835629,31 18,533,  - 
TEST,2447.01 8322,31 36,525,  - 
TEST,2446.632702,31 05,526,  - 
TEST,2440.712682,3130,516,  - 
TEST,2448.6081 29,31 07,527,  - 
TEST,2449.359481, 31 20,525,  - 
TEST,2442.676588,31 32,524,  - 
TEST,2455.770905,31 06,537,  - 
TEST,2454.41 7279,31 22,523,  - 
TEST,2456.547355,31 07,535,  - 
TEST,2445.312476,31 14,520,  - 
TEST,2451 .7751 72,31 17,522,  - 
TEST,3333.798841,2153,1320,  *** 
TEST,2467. 002340,31 09,522,  - 
TEST,2436.862463,3121,528,  - 
TEST,2443.595743,31 37,521,  - 
TEST,2440.009398,31 40,523,  - 


NNET  TRAINING  RECORD  Ver.  11.0  (LI)  (Epochs,SSEs,#Correct,#MoreThan3off) 


starting  again  at  e265  (bestwts) 

Same  Weights,  Entire  File  Set,  NO  BATCHING,  Time-Aligning,  Separate  Test  Set, 
win_width  =  13,  max_offset  =  2,  seed  =  0 

logsig/logsig:  Hidden  =  256,  Num_Dips  =  157  ,  me  =  0.950000  ,  er  =  1.050000 
Start  Time  =  1996/6/7  14:2:13 

266,  TRAIN,378.000310,15396,264,  TEST,2444.860403,31 56,510,  - 

267,  TRAIN, 372.059882, 15399, 264,  TEST,2446.966214,3151,506,  - 

268,  TRAIN,369.200690,1 5401, 261,  TEST,2447.49061 1,3152,507,  - 

269,  TRAIN,367.251 847,1 5402,262,  TEST,2446.306237,31 44,510,  - 

270,  TRAIN,365.497574,1 5402,257,  TEST,2451 .426672,31 50,510,  - 


NNET  TRAINING  RECORD  Ver.  11.0  (LI)  (Epochs,SSEs,#Correct,#MoreThan3off) 
Resuming  at  bestwts  with  lower  me  &  er 

Same  Weights,  Entire  File  Set,  NO  BATCHING,  Time-Aligning,  Separate  Test  Set, 
win_width  =  13,  max_offset  =  2,  seed  =  0 

logsig/logsig:  Hidden  =  256,  Num_Dips  =  157  ,  mo  =  0.900000  ,  er  =  1.040000 
Start  Time  =  1996/6/9  0:39:1 

268,  TRAIN,368.908046,1 5400,261,  TEST,2447.784763,3151,506,  - 

269,  TRAIN,366.809044, 15402,259,  TEST,2447.1 17671,3149,510,  - 

270,  TRAIN,365.067769, 15402,258,  TEST,2449.033369,31 53,506,- 
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271, 

272, 

273, 

274, 

275, 

276, 

277, 

278, 

279, 

280, 
281, 
282, 

283, 

284, 

285, 

286, 

287, 

288, 
289, 


TRAIN,363.9031 21 ,1 5403,257, 
TRAIN, 362.567736,1 5404,255, 
TRAIN,360.5461 58,1 5405,255, 
TRAIN, 358.825661 ,1 5409,254, 
TRAIN, 357.0921 1 2,1 541 1 ,254, 
TRAIN,355.1001 94,1 5413,253, 
TRAIN,355.513832,1 5414,253, 
TRAIN, 352.115412, 15413, 253, 
TRAIN, 349.741 1 96,1 5416,251 , 
TRAIN,348.319405,15414,251, 
TRAIN,346.461842,15416.253, 
TRAIN,343.155424,15416,252, 
TRAIN, 340. 700546,15417, 251, 
TRAIN,338.867598,1 541 9,252, 
TRAIN,337.138286,1541 9,252, 
TRAIN,335.532533,1 5421 ,252, 
TRAIN, 336.559977,1 5422,254, 
TRAIN,334.654073,1 5424,253, 
TRAIN,332.514629,15425,254, 


TEST,2448.651252,3154.505, 
TEST, 2447.202886, 31 55,509, 
TEST,2444.41 2520,31 57,509, 
TEST,2445.037272,31 58,505, 
TEST,2442.857028,31 60,501 , 
TEST,2440.393299,31 59,501 , 
TEST,2441 .208379,31 58,498, 
TEST, 2439.095758, 31 54,496 
TEST,2437.1 64771 ,31 57,501 , 
TEST, 2436.33721 5,31 50,498, 
TEST,2440.241 266,31 57,502, 
TEST,2440.4841 54,31 57,502, 
TEST,2439.1 6661 3,31 49,506, 
TEST,2436.070799,31 54,507, 
TEST,2435.799594,31 52,503, 
TEST,2435.036998,31 56,503, 
TEST,2439.458499,31 59,505, 
TEST,2437.624035, 31 49,507, 
TEST,2438.671 405,31 44,509, 
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APPENDIX  G.  WORD  RECOGNITION  RESULTS 


A.  DESCRIPTION 

The  following  report  was  produced  by  RECOGNIZ.CL  during  the  word  tests  described  in 
Chapter  IV.  This  report  is  intended  for  statistical  purposes,  not  syntactic  or  semantic  analysis. 
Each  line  in  the  report  is  a  LISP  list  with  nine  elements: 

1.  The  sample  number  where  the  correct  word  appeared  strongest  in  the  output  stream  (zero 
if  the  word  was  not  detected). 

2.  The  correct  word. 

3.  The  word  confidence  value  for  the  word  candidate  described  in  items  one  and  two  (zero  if 
the  word  was  not  detected). 

4.  The  ranking  of  the  correct  word  among  all  word  candidates  with  the  same  starting  time 
(one  if  the  word  was  not  detected). 

5.  The  global  ranking  of  the  correct  word  (among  all  words  in  the  output  stream).  This  is 
only  marginally  useful  in  continuous  speech  recognition,  as  discussed  in  the  Chapter  IV 
(the  total  number  of  word  candidates  if  the  correct  word  was  not  detected). 

6.  The  strongest  word  candidate  with  the  same  start  time  as  the  correct  word  (nil  if  the 
correct  word  was  not  detected). 

7.  The  word  confidence  value  for  the  word  candidate  described  in  item  six. 

8.  The  strongest  word  candidate  among  all  words  in  the  output  stream. 

9.  The  word  confidence  value  for  the  word  candidate  described  in  item  eight. 

B.  RESULTS 

(3933  are  0.1873845  7  482  ART  0.6646775  IN  0.9814075) 

(0  for  0  1  871  nil  0  NEXT  0.7313204) 

(2736  quick  0.681 1 147  1  16  QUICK  0.681 1 147  WICK  0.960505) 

(2052  in  0.46714053  15  44  LINE  0.67273206  AM  0.922388) 

(3078  in  0.4691025  1  7  IN  0.4691025  THE  0.929525) 

(0  consists  0  1  1644  nil  0  THE  0.761428) 

(3591  of  0.821699  1  1  OF  0.821699  OF  0.821699) 

(4959  and  0.14384134  10  264  LIE  0.38751552  LIE  0.748632) 

(2052  are  0.127771  4  365  US  0.49286598  US  0.971366) 

(3420  made  0.5320155  9  28  MAY  0.994712  MAY  0.994712) 

(3249  of  0.0054145  1  384  OF  0.0054145  AM  0.8061645) 

(3591  are  0.482579  5  20  ROAM  0.50531733  BE  0.946454) 

(2394  in  0.5402075  8  20  DIM  0.94087  DIM  0.94087) 
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(5130  stockings  0.29724526  59  1 1 16  TORN  0.8617256  US  0.99528396) 

(3762  run  0.304104  1  230  RUN  0.304104  MAY  0.794837) 

(4788  dishes  0.27224898  21  693  DENSE  0.71203  HE  0.925552) 

(2736  makes  0.93518746  2  3  AM  0.988569  AM  0.988569) 

(4275  and  0.029465998  11  563  AIN~T  0.27141333  IT  0.81334496) 

(2052  intelligence  0.40138015  7  274  INTENT  0.6046135  TEN  0.986287) 
(2907  is  0.20747401  6  77  EAT  0.4195245  TELL  0.555865) 

(0  for  0  1  770  nil  0  ILL  0.835065) 

(0  become  0  1  1523  nil  0  BE  0.999408) 

(5130  artists  0.3802265  6  1 13  RIM  0.528966  WE  0.955054) 

(5130  fish  0.950968  1  1  FISH  0.950968  FISH  0.950968) 

(4617  leap  0.5617575  1  7  LEAP  0.5617575  KNEE  0.870113) 

(4446  frantically  0.7132638  1  20  FRANTICALLY  0.7132638  LEAF  0.955064) 
(2736  of  0.120827004  5  202  SEEN  0.422501  IS  0.63039) 

(12312  cat  0.73065  1  29  AT  0.73065  TELL  0.942454) 

(3762  my  0.752832  3  3  MIND  0.814682  MIND  0.814682) 

(2907  by  0.999798  1  1  BUY  0.999798  BUY  0.999798) 

(1539  of  0.3058325  3  48  AGREE  0.32896432  RAIN  0.495736) 

(3420  may  0.942935  1  1  MAY  0.942935  MAY  0.942935) 

(2907  money  0.783675  4  18  MY  0.8917825  KNEE  0.996276) 

(3933  by  0.998938  1  1  BUY  0.998938  BUY  0.998938) 

(0  hard  0  1  1835  nil  0  ARE  0.980915) 

(6156  candy  0.833961  1  6  CANDY  0.833961  TEA  0.9668495) 

(1881  are  0.4910435  2  66  COLLIE  0.4913785  REAL  0.976672) 

(0  for  0  1  553  nil  0  LEAD  0.78059447) 

(4275  big  0.762662  3  3  IN  0.8091895  IN  0.8091895) 

(4275  ambled  0.35976335  2  5 1  MILK  0.36078367  PLAN  0.61251634) 

(2565  increases  0.137292  24  824  NEAREST  0.5557577  EASE  0.969679) 
(1026  is  0.254825  6  218  WRIST  0.613279  ARE  0.980894) 

(0  controlled  0  1  1429  nil  0  LOT  0.917583) 

(3591  from  0.3059175  13  316  ROB  0.60963464  ARE  0.981709) 

(1368  is  0.4983955  14  96  EAR  0.973136  EAR  0.973136) 

(0  for  0  1  634  nil  0  CALF  0.7700615) 

(4104  buyer  0.98096  3  3  BUY  0.997935  BUY  0.997935) 

(3762  money  0.88202864  2  8  US  0.9071 145  ACE  0.9591505) 

(0  for  0  1  437  nil  0  EASE  0.9729785) 

(5814  all  0.250349  1 1  225  LET  0.43589398  SALT  0.570993) 

(1881  are  0.49441 198  6  73  LOCK  0.98185  LOCK  0.9870567) 

(4275  expected  0.56739616  17  103  STRUCK  0.8219268  LOCK  0.871 18053) 
(2907  comes  0.31 1604  12  312  DOLL  0.56631  ELSE  0.9000225) 

(4788  for  0.12270367  7  362  RUN  0.493005  NONE  0.933492) 

(1710  his  0.5756375  1  2  HIS  0.5756375  THE  0.580016) 

(4104  broken  0.57375675  12  37  ROCK  0.8566167  PA  0.871716) 

(2565  he  0.9774945  1  1  HE  0.9774945  HE  0.9774945) 

(0  from  0  1  431  nil  0  ARM  0.7998555) 
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(0  every  0  1  752  nil  0  EAT  0.9720475) 

(3591  in  0.54388297  5  13  IT  0.61600196  SHE  0.7135) 

(2052  of  0.12831 1  6  367  ODD  0.413048  KNEE  0.904689) 

(0  cleans  0  1  1309  nil  0  ODD  0.8228425) 

(3762  for  0.4036405  3  36  ALL  0.7619205  ILL  0.94701254) 

(0  every  0  1  922  nil  0  TEA  0.7970245) 

(1368  related  0.27530575  11  362  ROCKETS  0.53542477  AT  0.92029953) 
(0  my  0  1  482  nil  0  DOLL  0.679531) 

(3420  ability  0.416951  10  54  BILL  0.6112095  BUY  0.984153) 

(3249  is  0.178628  3  83  WICK  0.506303  JAW  0.513643) 

(3249  made  0.042727  13  437  ATE  0.124715  TREE  0.6406955) 

(3420  of  0.06788  2  370  VAULT  0.091452  TREE  0.545698) 

(2394  are  0.219975  17  195  ROCK  0.659722  AM  0.911262) 

(3420  in  0.49384397  2  15  KNICK-  0.514528  THE  0.933759) 

(2736  stockings  0.76056767  4  7  SAY  0.90993404  NONE  0.9181985) 

(2223  run  0.66080195  1  4  RUN  0.66080195  DONE  0.89926696) 

(5301  dishes  0.481654  1  52  DISHES  0.481654  DISH  0.9055015) 

(2907  makes  0.881495  1  8  MAKES  0.881495  BUY  0.99428) 

(0  and  0  1  335  nil  0  TEN  0.7172965) 

(4446  is  0.38257  2  71  LESS  0.4374865  HE  0.9108505) 

(5130  for  0.738793  5  16  ON  0.90576303  KNEE  0.9634295) 

(4959  ended  0.23284724  1 1  276  DUMB  0.52641404  TEA  0.977053) 

(4788  please  0.36718166  21  296  EAT  0.6738665  THESE  0.89815354) 
(2736  cleaners  0.504659  16  124  KEY  0.7781905  KNEES  0.9892255) 

(3933  for  0.515878  1  11  OR  0.515878  ADD  0.6838985) 

(1539  cleaners  0.33306125  34  533  QUEEN  0.61019766  KNEE  0.87358904) 
(6156  all  0.121824  15  304  LOCKED  0.33104834  US  0.934741) 

(4446  are  0.866431  1  2  ARE  0.866431  ILL  0.9536) 

(3249  expected  0.61259025  3  40  EXPECT  0.7148502  PECK  0.930991) 
(2736  comes  0.33323964  7  182  CAUGHT  0.495526  RAW  0.9304695) 

(0  for  0  1  565  nil  0  AIN-T  0.9800635) 

(2907  his  0.577604  1  2  HIS  0.577604  THE  0.663486) 

(4446  broken  0.143412  39  1026 1~LL  0.4946285  TELL  0.864295) 

(3420  he  0.57342803  1  4  HE  0.57342803  KNEE  0.69167054) 

(4446  from  0.5095253  1  10  FROM  0.5095253  RUN  0.66486704) 

(6156  every  0.7046505  2  3  EVER  0.92598  EASE  0.965269) 

(2565  in  0.2646255  10  58  IF  0.5510375  IF  0.5510375) 

(3762  of  0.022181  18  571  LEAST  0.30533427  KNEE  0.48568553) 

(4788  cleaned  0.251 17275  14  81  RAY  0.39502448  LEAF  0.5082985) 

(3933  for  0.5500345  3  8  ALL  0.6511795  ILL  0.781802) 

(7353  are  0.812259  1  5  ARE  0.812259  KNEE  0.999797) 

(4446  all  0.330042  2  1 15  CORY  0.53609467  BE  0.859098) 

(4959  errors  0.7802375  2  6  RAW  0.8750295  OR  0.910335) 

(2907  nearest  0.28977123  6  219  NEAR  0.46660066  ERRORS  0.846212) 
(3591  may  0.999945  1  1  MAY  0.999945  MAY  0.999945) 
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(2907  be  0.966185  1  2  BE  0.966185  TEA  0.980755) 

(3078  distance  0.534456  14  45  DIM  0.79010266  IS  0.984571) 

(3078  buys  0.8794905  1  1  EYES  0.8794905  EYES  0.8794905) 

(0  catastrophic  0  1  2944  nil  0  CAST  0.977149) 

(4104  intelligence  0.8807526  1  7  INTELLIGENCE  0.8807526  EDGE  0.9434935) 
(0  is  0  1  399  nil  0  KNEE  0.7552695) 

(3933  for  0.518121  2  31  REAL  0.51882535  KEY  0.95744205) 

(0  become  0  1  1646  nil  0  WE  0.96533096) 

(0  artists  0  1  1954  nil  0  TEA  0.992695) 

(4446  fish  0.77679  1  2  HSH  0.77679  THEY  0.81535) 

(3591  leap  0.881914  1  6  LEAP  0.881914  KNEE  0.976488) 

(4275  frantically  0.68338925  7  50  ROCK  0.87625533  KNEE  0.99904) 

(2907  of  0.533668  1  27  OF  0.533668  SOME  0.95681 155) 

(3249  did  0.3929135  10  78  IN  0.861815  IN  0.861815) 

(3762  buy  0.663005  1  4  BUY  0.663005  DAY  0.951262) 

(3249  any  0.16541149  1  135  KNEE  0.16541149  RUN  0.57370245) 

(4104  bright  0.42460397  38  319  BREAK  0.86566097  ARM  0.9831 12) 

(1881  red  0.054182995  32  639  RUN  0.61986834  RUN  0.89469945) 

(3249  his  0.035689  25  472  SICK  0.150853  EAT  0.95563495) 

(0  for  0  1  524  nil  0  EAT  0.836613) 

(1881  is  0.3676555  1 1  1 10  LIST  0.827277  SEA  0.965967) 

(3591  am  0.55402553  3  41  AN  0.810462  RAIN  0.9514865) 

(3762  and  0.134138  17  13HT  0.727708  KNIT  0.794451) 

(513  are  0.0644365  17  136  ART  0.47869247  ACT  0.61233234) 

(513  and  0.32305935  19  223  TRAIT  0.48860264  IT  0.9717835) 

(1710  are  0.401409  1  19  ARE  0.401409  AIR  0.6997295) 

(3762  spend  0.82212704  2  5  SPUN  0.9974505  SPUN  0.9974505) 

(1026  in  0.52595  1  43  IN  0.52595  MY  0.99193) 

(3249  miami  0.94332004  4  14  MY  0.988132  NUMB  0.9886405) 

(3249  is  0.52389604  1  16  IS  0.52389604  EASE  0.976816) 

(3078  be  0.936814  1  1  BE  0.936814  BE  0.936814) 

(3933  need  0.4806575  7  59  KNEE  0.941606  KNEE  0.941606) 

(6498  for  0.06872866  2  283  ROCK  0.073731996  SOME  0.904341) 

(1539  and  0.095671676  30  450  DOOR  0.5400325  RAW  0.81501305) 

(3762  in  0.5900825  2  1 1  KNICK-  0.6018045  EASE  0.828404) 

(4275  big  0.698352  1  1  BIG  0.698352  BIG  0.698352) 

(2394  can  0.03508467  8  379  KNEE  0.481827  BE  0.993945) 

(3591  be  0.991809  1  1  BE  0.991809  BE  0.991809) 

(3078  an  0.54280347  1  9  AN  0.54280347  BE  0.991342) 

(2907  sport  0.5554825  5  15  SPHERE  0.6081026  IS  0.943873) 

(4788  are  0.060823  8  533  ARC  0.492294  TELL  0.6391855) 

(1881  or  0.002651  31  403  ERA  0.715299  ERA  0.98236954) 

(0  proceeding  0  1  2498  nil  0  US  0.97203505) 

(3933  grows  0.63795096  8  17  ROLE  0.8424557  ROLE  0.8424557) 

(1881  he  0.580403  2  5  HIS  0.6197465  THE  0.774014) 
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(4617  from  0.49266002  3  1 1  RIM  0.509305  ARE  0.891 187) 

(4446  every  0.2982155  5  174  VERY  0.551038  IF  0.8537075) 

(2394  in  0.66372347  1  1  IN  0.66372347  IN  0.66372347) 

(3249  of  0.527506  1  13  OF  0.527506  ROAST  0.77329755) 

(0  cleans  0  1  1536  nil  0  AM  0.80486846) 

(4104  for  0.95127654  1  1  OR  0.95127654  OR  0.95127654) 

(4275  and  0.5518295  1 1  44  NUT  0.870758  TIM  0.881 1 1496) 

(855  are  0.794573  2  5  ARM  0.879726  RAY  0.982962) 

(4104  spend  0.96951437  1  4  SPEND  0.96951437  KNICK-  0.993262) 
(2907  in  0.85449004  2  6  KNICK-  0.8615955  MY  0.997667) 

(3420  miami  0.96991366  3  9  MY  0.994174  MY  0.994174) 

(1 197  is  0.1403235  4  178  EAR  0.4725025  RARE  0.618141) 

(3078  be  0.553444  1  18  BE  0.553444  IN  0.86775804) 

(2736  need  0.8220406  2  6  KNEE  0.8880335  KNEE  0.8880335) 

(0  for  0  1  535  nil  0  ARE  0.798531) 

(6840  and  0.09191799  10  429  AT  0.1302845  PA  0.702636) 

(2565  in  0.553236  1  5  IN  0.553236  IS  0.954143) 

(4959  big  0.95791996  1  1  BIG  0.95791996  BIG  0.95791996) 

(3249  can  0.85563505  1  2  CAN  0.85563505  BE  0.882378) 

(3249  be  0.468438  1  39  BE  0.468438  TEN  0.8422125) 

(2394  an  0.6909615  1  8  AN  0.6909615  BE  0.871572) 

(0  sport  0  1  1509  nil  0  PA  0.99963796) 

(4617  did  0.035262667  7  391  TEA  0.710287  HE  0.927991) 

(3933  buy  0.861672  1  2  BUY  0.861672  TEA  0.8809395) 

(4104  any  0.44171765  4  108  EAT  0.637507  BITES  0.904883) 

(4617  bright  0.3132487  38  309  RATE  0.621942  PAY  0.9675015) 
(2907  red  0.48889533  26  1 14  RAN  0.8559303  ATE  0.984944) 

(4104  his  0.013795  9  382  KIN  0.4816775  TEN  0.753637) 

(0  for  0  1  696  nil  0  LACK  0.71 12865) 

(7524  is  0.47378  7  82  ILL  0.4791505  CEASE  0.9783765) 

(4788  am  0.932438  2  2  MAD  0.952546  MAD  0.952546) 

(4959  and  0.19536799  3  130  DOLL  0.633171  OF  0.665208) 

(1710  are  0.2249815  2  58  ODD  0.48851 1  OF  0.974792) 

(0  retracted  0  1  2839  nil  0  ACT  0.999018) 

(3762  an  0.038595498  7  274  AM  0.4883115  IN  0.72176754) 

(3249  michael  0.767261  4  8  LACK  0.8324325  HELD  0.87600994) 
(3249  of  0.136959  1  149  OF  0.136959  ARC  0.766384) 

(3249  is  0.16013251  6  257  IN  0.48552552  NEAT  0.952827) 

(0  big  0  1  827  nil  0 1~D  0.812281) 

(3591  in  0.3982255  7  62  ILL  0.63869596  CAR  0.926167) 

(2907  customer  0.63639677  11  60  CAR  0.82696337  ELSE  0.992997) 
(4617  spilled  0.8980517  1  7  SPILLED  0.8980517  AT  0.98903954) 
(2565  some  0.549559  7  62  SEX  0.717819  LACK  0.8690795) 

(3762  please  0.19748099  13  294  WRIST  0.5066415  KEY  0.957343) 
(855  dig  0.2947655  24  166  AIN~T  0.6577785  LIE  0.813268) 
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(2052  my  0.363324  27  328  MARK  0.84687304  ART  0.9514485) 

(2907  ride  0.57271534  7  51  RAN  0.78348535  ALL  0.976126) 

(3591  in  0.190855  8  186  IT  0.4774005  AN  0.791743) 

(3420  all  0.757403  1  3  ALL  0.757403  CALL  0.8220907) 

(4275  are  0.134143  12  388  ARC  0.56705046  QUICK  0.6629677) 

(4617  expected  0.6278721  10  31  EXPECT  0.7600608  PECK  0.9012055) 
(2907  comes  0.48489702  1  84  COMES  0.48489702  OF  0.91 1512) 

(2736  for  0.3745065  24  143  SCAR  0.684433  MAY  0.857799) 

(0  his  0  1  609  nil  0  HIM  0.957626) 

(4617  broken  0.32881474  50  363  ROCK  0.83747166  ART  0.936368) 
(2907  call  0.9886445  1  1  CALL  0.9886445  CALL  0.9886445) 

(513  an  0.55015695  4  15  LACK  0.8940275  ALL  0.966589) 

(3078  for  0.50967395  1  38  OR  0.50967395  OF  0.882862) 

(4617  can  0.878659  4  4  AT  0.913415  AT  0.913415) 

(4959  buy  0.998515  1  1  BUY  0.998515  BUY  0.998515) 

(2565  mine  0.63868666  6  36  MIGHT  0.95347065  MY  0.991084) 

(1881  planned  0.30762476  8  474  PLANT  0.469722  LIE  0.981826) 

(3078  is  0.518966  2  20  EASE  0.5199305  ALL  0.999285) 

(4446  every  0.65741 1  1  4  EVERY  0.65741 1  ERA  0.828153) 

(513  related  0.4598612  3  218  SHRIEKED  0.63513535  MY  0.99985) 
(4104  my  0.999866  1  1  MY  0.999866  MY  0.999866) 

(2907  ability  0.5880596  1  81  ABILITY  0.5880596  PICK  0.9721955) 
(4275  is  0.932337  1  1  IS  0.932337  IS  0.932337) 

(0  expense  0  1  1730  nil  0  BACKED  0.554895) 

(5130  his  0.060806002  16  640  IN  0.18996951  OF  0.742972) 

(3078  he  0.6044855  1  9  HE  0.6044855  IT  0.941069) 

(3591  street  0.64767903  2  23  STAY  0.69518054  LAST  0.97051346) 
(4788  needed  0.33841646  13  263  NEED  0.50640696  LIE  0.904281) 
(4104  be  0.604895  2  4  DID  0.64988554  OF  0.74373) 

(0  elegant  0  1  1533  nil  0  ILL  0.98510647) 

(5985  or  0.296297  1  123  OR  0.296297  SEA  0.951602) 

(4446  from  0.6363135  1  10  FROM  0.6363135  NUMB  0.9409675) 

(4959  answer  0.66709536  4  16  ANTS  0.78353834  EAT  0.95737696) 

(0  did  0  1  426  nil  0  BUY  0.991301) 

(4104  buy  0.99261 1  2  2  BUYING  0.994764  BUYING  0.994764) 

(4446  any  0.431944  2  68  BE  0.4329035  KNEE  0.80265796) 

(0  bright  0  1  1584  nil  0  RAY  0.9772105) 

(0  red  0  1  669  nil  0  RAY  0.99966) 

(1710  his  0.583349  1  19  HIS  0.583349  IS  0.956598) 

(0  for  0  1  563  nil  0  IN  0.9903475) 

(2052  is  0.7320885  7  8  ILL  0.885158  ILL  0.885158) 

(3933  am  0.736646  11  18  AN  0.961191  NUMB  0.9784835) 

(3762  and  0.15436698  13  245  MAIL  0.508257  ILL  0.96893) 

(6156  are  0.2802425  1  220  ARE  0.2802425  TEA  0.797853) 

(2565  forces  0.4873885  6  157  OFFER  0.50261  SAYS  0.99343395) 
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(0  from  0  1  704  nil  0  COME  0.648937) 

(1368  lily  0.376172  5  111  LID 0.444748  LEAD 0.81766903) 

(5985  are  0.713514  1  6  ARE  0.713514  CAT  0.7549135) 

(3591  of  0.289176  1  97  OF  0.289176  ROAM  0.8430285) 

(1 197  come  0.726033  4  4  CAT  0.9088  CAT  0.9088) 

(2565  makes  0.6123265  4  8  AM  0.856255  AM  0.856255) 

(3933  forces  0.5858515  13  125  OUGHT  0.9883095  OUGHT  0.9883095) 
(4959  from  0.20107101  10  336  IS  0.543993  IS  0.82086253) 

(342  lily  0.48904565  3  35  KNEE  0.531826  LEAD  0.8587295) 

(4788  are  0.1889905  1  171  ARE  0.1889905  OF  0.676669) 

(3762  of  0.099152  16  442  VERY  0.3739163  BUY  0.891735) 

(2736  come  0.7432185  1  2  COME  0.7432185  DID  0.8234095) 

(2565  makes  0.66373205  5  21  AM  0.965889  RACK  0.98733056) 

(6498  and  0.047423664  33  1007  DAY  0.524421  PAIN  0.955674) 

(1026  sprained  0.84680504  4  4  SPRAY  0.890635  SPRAY  0.890635) 

(4617  steep  0.74295264  2  12  TEA  0.8021105  EATING  0.927511) 

(6840  can  0.2297025  9  152  CALL  0.62123  RAW  0.965253) 

(2907  be  0.4173385  2  45  TEASED  0.50234354  RUN  0.8956425) 

(3078  diseases  0.54496515  5  54  TEASED  0.88941234  EASED  0.9972015) 
(5130  are  0.154454  6  354  RUN  0.29224348  TORE  0.574553) 

(1539  in  0.5547795  1  7  IN  0.5547795  AM  0.872377) 

(0  for  0  1  518  nil  0  TRUCK  0.78511834) 

(2394  all  0.672967  3  9  CALLED  0.73144895  TRUCK  0.8675534) 

(3249  he  0.86329997  1  1  HE  0.86329997  HE  0.86329997) 

(7524  lie  0.324414  13  340  NEED  0.9220735  NEED  0.9220735) 

(5130  all  0.98153  1  1  ALL  0.98153  ALL  0.98153) 

(0  once  0  1  1690  nil  0  LOTS  0.79945636) 

(2907  cory  0.9338161  3  8  CAR  0.99947953  CAR  0.99947953) 

(1881  and  0.24124269  18  301  CAN-T  0.73680097  EAT  0.8592965) 

(2565  played  0.19291966  19  535  PARK  0.582351  SET  0.655815) 

(3933  for  0.2529605  1  106  OR  0.2529605  DOLL  0.962564) 

(2907  right  0.50427634  30  1 16  RAY  0.987169  THE  0.991819) 

(2736  may  0.4712895  8  54  KNICK-  0.623938  ERA  0.84747005) 

(0  be  0  1  463  nil  0  ELM  0.980568) 

(0  for  0  1  563  nil  0  ARM  0.9719835) 

(2907  business  0.41285002  5  115  BE  0.7109755  TEA  0.76680195) 

(0  retracted  0  1  2477  nil  0  RACK  0.8348255) 

(3249  an  0.6742895  1  2  AN  0.6742895  THE  0.9671825) 

(3762  michael  0.952039  1  2  MICHAEL  0.952039  HE  0.9617035) 

(3078  of  0.507769  1  8  OF  0.507769  THE  0.605966) 

(3078  is  0.53643  1  7  EARS  0.53643  OF  0.7659205) 

(4275  big  0.885615  3  4  EAR  0.936961  ALL  0.988824) 

(171  in  0.356783  1  12  IN  0.356783  SHE  0.7273195) 

(0  combine  0  1  2335  nil  0  THE  0.90789354) 

(701 1  all  0.998606  1  1  ALL  0.998606  ALL  0.998606) 
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(2394  in  0.176769  4  97  ILL  0.275739  PILL  0.727497) 

(4788  by  0.465516  2  20  IN  0.4910425  AN  0.71963596) 

(2223  and  0.01 1641  38  478  HOARSE  0.3187473  EDGES  0.4267713) 

(4275  stray  0.511911  16  41  TRAIN  0.77393436  TRAIN  0.77393436) 

(4275  for  0.0959965  4  301  OFFER  0.335296  SON  0.5828135) 

(4959  one  0.52649736  1  19  DONE  0.52649736  SON  0.771438) 

(3420  of  0.365346  1  47  OF  0.365346  TEA  0.7059855) 

(4959  buy  0.968668  1  1  BUY  0.968668  BUY  0.968668) 

(3762  be  0.887181  1  1  BE  0.887181  BE  0.887181) 

(3933  right  0.43724  17  147  RATE  0.77861863  HATE  0.989503) 

(9234  hand  0.30422333  9  240  AN  0.600215  ATE  0.99579) 

(2052  are  0.06934  9  260  LOCK  0.546589  IS  0.934481) 

(0  for  0  1  712  nil  0  IN  0.68795455) 

(5130  quick  0.44917202  5  97  WICK  0.54375  ODD  0.8171425) 

(2565  in  0.446807  2  17  AN  0.45878  THE  0.664602) 

(513  in  0.4423485  2  50  RIM  0.472088  AM  0.985919) 

(0  consists  0  1  1037  nil  0  TRISH  0.904453) 

(3591  of  0.181971  2  146  PAW  0.185603  SCORE  0.5949255) 

(855  and  0.07346133  10  365  DREAM  0.409912  STEAM  0.6959022) 

(6156  are  0.464872  1  26  ARE  0.464872  LAY  0.799168) 

(0  for  0  1  673  nil  0  TRY  0.6944185) 

(4788  all  0.754036  1  4  ALL  0.754036  CRY  0.91946) 

(855  he  0.7957715  1  3  HE  0.7957715  WE  0.955418) 

(4104  lie  0.994933  1  1  LIE  0.994933  LIE  0.994933) 

(4959  all  0.873187  1  2  ALL  0.873187  AL  0.910257) 

(3078  once  0.256932  12  352  WALLS  0.34868467  LIQUIDS  0.75837594) 

(2736  cory  0.98354274  1  2  CORY  0.98354274  OR  0.989743) 

(4617  and  0.060024668  1  255  AND  0.060024668  OR  0.987819) 

(2736  played  0.61571497  14  43  PLATE  0.9425706  PLATE  0.9425706) 

(1710  for  0.081073  3  390  SHE  0.562178  ALL  0.943817) 

(2394  right  0.34028998  47  187  RAIN  0.788333  EAR  0.906577) 

(2736  may  0.47042698  16  69  NEAR  0.72723365  IN  0.94518) 

(1710  be  0.695474  1  1  BE  0.695474  BE  0.695474) 

(0  for  0  1  742  nil  0  EFFORT  0.51267797) 

(4617  business  0.53689677  8  29  IT  0.971018  IT  0.971018) 

(4788  intelligence  0.75730616  1  13  INTELLIGENCE  0.75730616  TELL  0.98859894) 
(3078  is  0.54569  1 1  38  IF  0.9826335  IF  0.9826335) 

(3249  for  0.34002  5  137  KEY  0.49852  IF  0.916954) 

(0  become  0  1  797  nil  0  IT  0.985831) 

(0  artists  0  1  1746  nil  0  ARE  0.995518) 

(5130  fish  0.9854655  1  1  HSH  0.9854655  HSH  0.9854655) 

(3933  leap  0.906996  1  2  LEAP  0.906996  SLEEP  0.91348) 

(5301  frantically  0.70201045  4  40  RAN  0.90828234  LEAP  0.9646855) 

(3078  of  0.8867855  1  2  OF  0.8867855  THE  0.96582496) 

(2907  he  0.7667085  1  1  HE  0.7667085  HE  0.7667085) 
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(2736  from  0.429165  3  28  THE  0.46950498  THE  0.80985796) 

(0  every  0  1  1235  nil  0  ME  0.74651897) 

(2565  in  0.7665295  1  1  IN  0.7665295  IN  0.7665295) 

(3249  of  0.130146  4  357  DUMPED  0.25581574  AT  0.9835425) 

(6156  cleans  0.32329035  2  265  CLEANED  0.332871  WE  0.932972) 
(3933  for  0.43159348  7  70  ALL  0.653118 1~LL  0.8316715) 

(4959  are  0.951604  1  1  ARE  0.951604  ARE  0.951604) 

(3762  all  0.47815698  1  34  ALL  0.47815698  BUY  0.982248) 

(5301  errors  0.516481  16  63  RAN  0.8071805  DANCE  0.844431) 
(2736  nearest  0.721354%  4  16  NEAR  0.7779653  IN  0.953043) 

(4788  may  0.979853  1  1  MAY  0.979853  MAY  0.979853) 

(2907  be  0.706193  1  7  BE  0.706193  ROB  0.7864735) 

(3420  distance  0.43981838  1 1  150  EAR  0.500403  NEAT  0.9200455) 
(2907  buys  0.96787345  1  1  EYES  0.96787345  EYES  0.96787345) 

(0  catastrophic  0  1  2498  nil  0  ROCK  0.975049) 

(3762  expense  0.3953542  8  235  EXQUISITE  0.525064  IS  0.8629875) 
(5301  his  0.31044832  6  253  IS  0.625806  KNEE  0.929264) 

(0  he  0  1  650  nU  0  KNIT  0.7341735) 

(4104  street  0.536439  7  77  TOWARD  0.64848  SLEIGH  0.81821) 

(0  needed  0  1  682  nil  0  ARE  0.919893) 

(3762  be  0.8684  1  2  BE  0.8684  COME  0.9096455) 

(0  elegant  0  1  1302  nil  0  ILL  0.943716) 

(6840  or  0.083723  1  286  OR  0.083723  LINE  0.658156) 

(4446  from  0.7215745  6  22  SON  0.97371495  SON  0.97371495) 

(2394  answer  0.58966035  7  15  CASH  0.90027654  EASE  0.96793103) 
(1026  is  0.508889  1  4  IS  0.508889  WAS  0.6000145) 

(0  for  0  1  631  nil  0  SEA  0.953652) 

(4104  ended  0.096683  10  481  DIM  0.410102  IN  0.9259525) 

(1368  please  0.7606743  5  15  PLAY  0.93818855  KEY  0.97387695) 
(1539  cleaners  0.617023  2  24  CLEANED  0.64767736  IF  0.997653) 
(4104  for  0.31277534  2  1 10  RAIN  0.33409402  KEY  0.730406) 

(684  cleaners  0.35479873  11  278  THEY  0.540189  SAY  0.875486) 
(342  and  0.21525735  24  229  PA  0.5436815  KNIT  0.6457385) 

(342  are  0.159551  2  122  LOT  0.3031045  AT  0.570912) 

(4446  spend  0.58536196  1  7  SPEND  0.58536196  MY  0.977887) 

(684  in  0.3691385  3  22  ARM  0.5072465  MY  0.991722) 

(3933  miami  0.8216823  6  1 1 1~D  0.938338  AM  0.995042) 

(2736  is  0.094373  1 1  279  IF  0.5052105  EAR  0.7667885) 

(3249  be  0.243758  2  1 10  ILL  6.357422  AND  0.72485495) 

(4104  need  0.433662  7  100  EAT  0.760976  ALL  0.850363) 

(5643  for  0.1402585  9  249  BILL  0.7864275  ILL  0.979928) 

(0  and  0  1  989  nil  0  PA  0.8785385) 

(3591  in  0.093512  2  119  IT  0.1096295  STAB  0.6522033) 

(4446  big  0.9960965  1  1  BIG  0.9960965  BIG  0.9960965) 

(342  can  0.34459502  3  94  CAB  0.5090985  BE  0.991 152) 
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(4275  be  0.983737  1  1  BE  0.983737  BE  0.983737) 

(1881  an  0.38184598  4  20  AIR  0.404366  BE  0.950224) 

(4275  sport  0.741544  2  4  PAW  0.930216  PAW  0.930216) 

(855  and  0.23219167  34  564  ACTS  0.64047736  BUY  0.978359) 

(0  sprained  0  1  850  nil  0  KNEE  0.94786847) 

(4788  steep  0.64166135  1  12  STEEP  0.64166135  TEA  0.989935) 

(2736  can  0.60271305  3  42  CALL  0.837999  ART  0.948496) 

(3762  be  0.983603  1  1  BE  0.983603  BE  0.983603) 

(2907  diseases  0.6263478  1  41  DISEASES  0.6263478  TEA  0.99168146) 
(5814  are  0.198397  1  249  ARE  0.198397  IS  0.890758) 

(1368  in  0.881668  5  12  IT  0.90067303  SEA  0.997706) 

(4788  is  0.387555  1  6  IS  0.387555  KIN  0.5595895) 

(0  for  0  1  375  nil  0  SEAM  0.531945) 

(1881  ended  0.46478274  2  20  DAMP  0.47975636  ROAM  0.688957) 

(4104  please  0.23349565  12  378  LEAST  0.794778  KICK  0.95276654) 
(4788  cleaners  0.979002  1  1  CLEANERS  0.979002  CLEANERS  0.979002) 
(4275  for  0.7545165  3  5  OUGHT  0.86524546  OUGHT  0.86524546) 

(5130  cleaners  0.9586566  1  3  CLEANERS  0.9586566  KNEES  0.9913635) 
(0  for  0  1  684  nil  0  SCORE  0.7727067) 

(2907  all  0.94479597  2  4  WALL  0.9570955  CALL  0.962341) 

(2736  he  0.48316  3  17  HIM  0.858088  HIM  0.858088) 

(2736  lie  0.988847  1  1  LIE  0.988847  LIE  0.988847) 

(4446  all  0.830073  1  4  ALL  0.830073  WE  0.877755) 

(0  once  0  1  1081  nil  0  ARE  0.9771 18) 

(2907  cory  0.6680763  10  57  KEY  0.980706  KEY  0.980706) 

(4104  and  0.32410467  1  206  AND  0.32410467  AIN~T  0.9538005) 

(2736  played  0.74045366  6  34  PLAY  0.9614675  PLAY  0.9614675) 

(0  for  0  1  638  nil  0  PA  0.907265) 

(2565  right  0.26723933  35  316  RAN  0.586568  MAY  0.972328) 

(2565  may  0.5661645  9  34  MAIN  0.70082766  AN  0.92523646) 

(2736  be  0.571  5  8  IN  0.7245845  IN  0.7245845) 

(513  for  0.028266998  8  396  RIM  0.35361966  MAP  0.5960065) 

(0  business  0  1  718  nil  0  IT~S  0.78192) 

(4617  cat  0.543817  2  83  CAN~T  0.6814  TEA  0.9534945) 

(3078  my  0.030406  10  637  MIGHTY  0.34338033  TEA  0.91964996) 

(4104  by  0.081496  7  352  ILL  0.45987698  ILL  0.860828) 

(2736  of  0.007097  2  499  UNTIL  0.039967  AN  0.513049) 

(3420  may  0.99853  1  1  MAY  0.99853  MAY  0.99853) 

(2223  money  0.40653732  1  63  MONEY  0.40653732  END  0.7150555) 

(3933  by  0.018854  15  757  I~VE  0.376524  OR  0.969039) 

(0  hard  0  1  1086  nil  0  AND  0.73307645) 

(4275  candy  0.83523226  7  9  AN  0.9264285  KNEE  0.9562905) 

(6840  are  0.486985  1  42  ARE  0.486985  EASE  0.986933) 

(2223  or  0.10115901  10  221  ILL  0.200308  PAY  0.5102995) 

(0  proceeding  0  1  1693  nil  0  SEA  0.991257) 
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(4275  grows  0.83805865  2  2  GROW  0.97479796  GROW  0.97479796) 
(0  for  0  1  519  nil  0  CALL  0.69420666) 

(5814  all  0.957863  1  2  ALL  0.957863  CALL  0.9763005) 

(2565  he  0.95453846  1  1  HE  0.95453846  HE  0.95453846) 

(3420  lie  0.9928765  1  2  LIE  0.9928765  ERA  0.9950775) 

(6327  all  0.1294955  4  119  BUY  0.555606  LOCK  0.6948565) 

(3249  once  0.5076793  5  106  LIKE  0.5930154  ICE  0.9866255) 

(2907  cory  0.92021227  3  13  CAR  0.983148  CAR  0.983148) 

(1539  and  0.607169  10  10  AN  0.91 18355  AN  0.91 18355) 

(3591  played  0.52258533  6  69  PLANE  0.785145  LAKE  0.981653) 

(0  for  0  1  457  nil  0  ALL  0.836136) 

(2907  right  0.387939  22  142  RARE  0.8179307  MAY  0.97761 1) 

(3420  may  0.999401  1  1  MAY  0.999401  MAY  0.999401) 

(2907  be  0.945499  1  1  BE  0.945499  BE  0.945499) 

(0  for  0  1  321  nil  0  MAY  0.960168) 

(4275  business  0.3376125  19  302  EAR  0.81403  MAY  0.876378) 

(3933  made  0.812132  2  4  MAY  0.982031  MAY  0.982031) 

(3591  of  0.518254  1  18  OF  0.518254  MAY  0.991932) 

(3420  are  0.371087  4  45  ARM  0.545964  MAY  0.720841) 

(2394  in  0.9179735  1  2  IN  0.9179735  THE  0.93517303) 

(0  stockings  0  1  2010  nil  0  PA  0.99738646) 

(6327  run  0.30176634  16  298  HEN  0.55641 13  CRACK  0.9660407) 
(1881  dishes  0.7509491  5  11  IS  0.7929945  ARE  0.968663) 

(2736  makes  0.74032825  5  27  MAKE  0.980865  FIX  0.998897) 

(5301  and  0.086665  6  495  TEA  0.5187015  TEA  0.982317) 

(2907  call  0.9748075  1  1  CALL  0.9748075  CALL  0.9748075) 

(7011  an  0.36097398  1  24  AN  0.36097398  ALL  0.883982) 

(0  for  0  1  839  nil  0  IF  0.9961 1247) 

(5130  can  0.8594225  1  1  AN  0.8594225  AN  0.8594225) 

(3933  buy  0.027052  18  624  MY  0.990647  MY  0.990647) 

(3933  mine  0.2143765  11  345  MY  0.422058  EASE  0.998759) 

(3762  planned  0.6314247  4  37  LAST  0.7199361  CLASS  0.93019533) 
(1710  is  0.4100755  3  20  LAWS  0.5038235  OF  0.803567) 

(2907  did  0.35356867  14  1 14  DIM  0.6404503  THE  0.7269925) 

(3933  buy  0.500594  2  9  BUYER  0.58126104  BE  0.983645) 

(4104  any  0.5482805  5  19  KNEE  0.815567  EAT  0.8756125) 

(4446  bright  0.5343033  19  55  RAY  0.9957335  RAY  0.9957335) 

(2907  red  0.20499833  27  488  RACK  0.741843  RACK  0.9660325) 
(4275  his  0.009072334  4  466  ATTACK  0.13780867  TEA  0.91 177046) 
(0  for  0  1  633  nil  0  ATE  0.813577) 

(2394  is  0.4924755  13  76  IT  0.74527097  SEA  0.995898) 

(2394  am  0.170798  13  130  AN  0.6440675  AN  0.6440675) 

(4104  and  0.12565534  6  135  LET  0.3225835  IS  0.912251) 

(3420  are  0.008394  9  376  ART  0.4833155  RIGHT  0.6433345) 

(4104  is  0.621084  1  4  IS  0.621084  HIS  0.79905045) 
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(0  for  0  1  697  nil  0  KEY  0.9709805) 

(6327  ended  0.48233074  1  60  ENDED  0.48233074  WICK  0.91735256) 
(4275  please  0.135144  16  440  LEAST  0.64782196  KEY  0.905193) 
(4959  cleaners  0.8101 1 134  4  15  LEAF  0.8775694  TEAM  0.88858104) 
(4617  for  0.9776345  1  2  OR  0.9776345  OR  0.978098) 

(1710  cleaners  0.5556795  6  17  KEY  0.972677  KEY  0.972677) 

(0  combine  0  1  1734  nil  0  BUY  0.997758) 

(4104  all  0.676671  1  2  ALL  0.676671  KNEE  0.8767795) 

(2736  in  0.258644  8  1 12  TELL  0.4616895  IS  0.726613) 

(2907  by  0.99858  1  1  BUY  0.99858  BUY  0.99858) 

(2052  and  0.15501633  28  539  DISK  0.55520004  SIT  0.8692895) 

(2736  stray  0.757498  10  41  STATE  0.98709327  STATE  0.98709327) 
(4788  for  0.333365  5  55  ALL  0.695916  ARE  0.791452) 

(5301  one  0.3380413  8  162  KNIT  0.632366  IT  0.9395145) 

(1368  of  0.4771635  4  17  LOCK  0.65582347  LOCK  0.65582347) 

(2736  buy  0.986681  1  1  BUY  0.986681  BUY  0.986681) 

(3762  be  0.999554  1  1  BE  0.999554  BE  0.999554) 

(3933  right  0.55758363  18  68  RAN  0.9056115  AN  0.967952) 

(5301  hand  0.5447083  1 1  163  ANT  0.888577  RIGHT  0.97379446) 

(3591  forces  0.28473848  33  550  RUN  0.57612866  DRESS  0.93540764) 
(4275  from  0.52706397  3  9  WROTE  0.59593964  PINK  0.636899) 

(2394  lily  0.8609307  2  4  ALL  0.910868  ARE  0.915687) 

(171  are  0.635702  1  1  ARE  0.635702  ARE  0.635702) 

(2907  of  0.46953952  2  4  WAS  0.599787  STAR  0.607177) 

(342  come  0.44964153  32  77  ANT  0.7045635  CALM  0.9035285) 

(2736  makes  0.6417075  13  77  MATE  0.8703527  ATE  0.9665415) 

(0  forces  0  1  1 175  nil  0  OR  0.956887) 

(0  from  0  1  779  nil  0  SIT  0.86364055) 

(2394  lily  0.59844637  4  12  ALL  0.889877  LEAD  0.8959745) 

(4959  are  0.776697  1  2  ARE  0.776697  CAR  0.8382215) 

(3420  of  0.423103  1  15  OF  0.423103  ARE  0.822638) 

(684  come  0.687477  3  6  CAT  0.691655  ARE  0.78350854) 

(3933  makes  0.600615  1  41  MAKES  0.600615  KEY  0.810822) 

(3591  intelligence  0.5893994  4  62  TELL  0.74492365  TELL  0.99582946) 
(3249  is  0.2164775  11  259  IF  0.5697995  ELSE  0.68548954) 

(4446  for  0.6213245  1  27  OR  0.6213245  KEY  0.94128895) 

(0  become  0  1  1 189  nil  0  ILL  0.990353) 

(0  artists  0  1  2394  nil  0  ARE  0.999325) 

(5130  fish  0.5410315  5  28  IN  0.85187304  THIS  0.914978) 

(2907  leap  0.683645  4  16  LAUGH  0.902829  LAUGH  0.91535604) 

(0  frantically  0  1  3172  nil  0  LAY  0.99783) 

(6840  of  0.153429  1  137  OF  0.153429  THE  0.91 197395) 

(0  combine  0  1  1084  nil  0  PA  0.9498085) 

(5130  all  0.676796  1  10  ALL  0.676796  ARC  0.7818275) 

(3420  in  0.045256503  9  256  AM  0.383708  AM  0.8640535) 
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(3078  by  0.582103  3  27 1~M  0.7812395  ELM  0.91207004) 

(3762  and  0.0899645  3  227  WAS  0.46108952  IS  0.519884) 

(4959  stray  0.4495805  6  159  TRAIN  0.61635566  RAIN  0.9146035) 

(5472  for  0.48567548  2  33  LAW  0.61257696  ARE  0.949331) 

(0  one  0  1  719  nil  0  OR  0.894329) 

(1197  of  0.6281405  1  3  OF  0.6281405  EYES  0.6535665) 

(3249  buy  0.043296  1 1  798  ALARM  0.44281274  ARE  0.999757) 

(3420  be  0.006503  15  807  KNEE  0.497184  ILL  0.995594) 

(3591  right  0.552795  1  6  RIGHT  0.552795  RAY  0.77944) 

(0  hand  0  1  1213  nil  0  EYES  0.739716) 

(3933  every  0.878931  1  1  EVERY  0.878931  EVERY  0.878931) 

(1881  related  0.2250042  35  790  CLEAN  0.7147794  ICE  0.9258395) 

(3591  my  0.950697  3  3  MICHAEL  0.9564165  MICHAEL  0.9564165) 

(3078  ability  0.686094  4  42  OF  0.845482  ILL  0.971603) 

(4617  is  0.224229  1  148  IS  0.224229  EASE  0.972531) 

(2736  call  0.9627123  1  1  CALL  0.9627123  CALL  0.9627123) 

(4788  an  0.368128  1  36  AN  0.368128  ON  0.74064803) 

(0  for  0  1  599  nil  0  MAY  0.970088) 

(6156  can  0.688847  1  21  AN  0.688847  ARE  0.985821) 

(3078  buy  0.0122535005  19  717  DARK  0.521023  I~LL  0.905662) 

(4275  mine  0.954407  1  2  MINE  0.954407  EASE  0.9841 16) 

(3591  planned  0.52060133  3  26  PLANES  0.59975535  HE  0.971161) 

(5643  is  0.4661285  1  36  IS  0.4661285  OR  0.966919) 

(6840  all  0.574464  1  47  ALL  0.574464  ARC  0.902798) 

(3762  are  0.265799  3  111  ARC  0.595085  KNICK-  0.6949625) 

(4104  expected  0.66305506  17  41  SON  0.9906335  SON  0.9906335) 

(0  comes  0  1  1040  nil  0  SKILLED  0.7625985) 

(4275  for  0.45202053  1  26  OR  0.45202053  IF  0.681062) 

(0  his  0  1  332  nil  0  THE  0.838398) 

(0  broken  0  1  1539  nil  0  TELL  0.918154) 

(4959  cat  0.77959204  3  14  AN  0.905579  WICK  0.94805396) 

(3420  my  0.905172  2  5  MIKE  0.9492355  LIKE  0.977201) 

(3933  by  0.916555  1  1  BUY  0.916555  BUY  0.916555) 

(2736  of  0.1464065  1  134  OF  0.1464065  ERA  0.86905) 

(3420  may  0.938518  1  1  MAY  0.938518  MAY  0.938518) 

(2736  money  0.94886535  1  4  MONEY  0.94886535  BUY  0.980169) 

(4446  by  0.984158  1  1  BUY  0.984158  BUY  0.984158) 

(0  hard  0  1  1323  nil  0  ARE  0.982227) 

(855  candy  0.627506  1  10  CANDY  0.627506  CAST  0.8054465) 

(4959  all  0.935452  1  1  ALL  0.935452  ALL  0.935452) 

(3762  are  0.47440952  8  242  LOCK  0.93543696  ILL  0.988291) 

(4104  expected  0.5802528  11  98  SECTOR  0.859001  STREETS  0.95590496) 
(2736  comes  0.33940268  9  132  DUMB  0.7059555  SOME  0.730124) 

(4275  for  0.183483  14  231  FROM  0.58310133  MAY  0.958797) 

(0  his  0  1  368  nil  0  THIS  0.849543) 
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(4275  broken  0.7551587  8  29  ROCK  0.9736133  ROCK  0.9736133) 

(4275  and  0.23999232  9  135  TEA  0.4809325  KNIT  0.865391) 

(2565  are  0.21095149  2  43  DONE  0.47918102  SON  0.62240255) 

(3933  spend  0.606105  4  12  SPENT  0.680036  TEA  0.7287055) 

(342  in  0.494933  1  25  IN  0.494933  TEA  0.748973) 

(3762  miami  0.60861135  3  8  MY  0.856976  KNEE  0.872054) 

(855  is  0.488251  2  18  IF  0.4987405  OFFER  0.924778) 

(3078  be  0.538946  1  5  BE  0.538946  ILL  0.757727) 

(2394  need  0.44443896  13  39  KNEE  0.6850685  IN  0.7775815) 

(2052  for  0.14654833  1  148  DOOR  0.14654833  KNEE  0.929848) 

(684  and  0.25001967  4  252  DRAIN  0.32404667  TRAY  0.906953) 

(3591  in  0.572753  1  13  IN  0.572753  SON  0.643328) 

(4104  big  0.507886  3  37  EAR  0.5222395  TAN  0.97978055) 

(5130  can  0.22817801  8  312  KNEE  0.5252335  TEA  0.9895445) 

(3591  be  0.285808  2  56  BEING  0.42581734  TEA  0.991212) 

(2907  an  0.3107515  1  50  AN  0.3107515  EASE  0.693688) 

(4617  sport  0.317252  4  148  PAM  0.375765  LINE  0.672041) 

(4446  and  0.01 1882334  21  1 124  SON  0.430753  NONE  0.8747025) 

(1539  sprained  0.84626395  1  4  SPRAINED  0.84626395  RAY  0.959778) 
(4617  steep  0.35940468  13  117  TEA  0.69522953  THE  0.7606315) 

(4446  can  0.59986  4  19  AL  0.861 14204  AL  0.96231604) 

(3420  be  0.999426  1  1  BE  0.999426  BE  0.999426) 

(3420  diseases  0.6209614  1  53  DISEASES  0.6209614  EASE  0.977815) 
(2052  are  0.4026315  2  51  SPHERE  0.6072603  IN  0.926581) 

(2736  in  0.5474965  1  13  IN  0.5474965  SEA  0.999077) 

(4446  expense  0.51346856  19  137  SIN  0.8717145  PINS  0.893044) 

(4788  his  0.319396  4  103  IS  0.865545  ME  0.903553) 

(0  he  0  1  631  nil  0  TEA  0.65759003) 

(3762  street  0.74414533  5  18  TIM  0.963028  LOCK  0.979748) 

(4959  needed  0.6532647  2  3  NEED  0.841 1895  NEED  0.841 1895) 

(3762  be  0.885067  1  3  BE  0.885067  TEA  0.965696) 

(0  elegant  0  1  1052  nil  0  TELL  0.69094145) 

(3249  or  0.005577  9  681  ROCKY  0.07812  AM  0.962452) 

(0  from  0  1  929  nil  0  COME  0.795348) 

(5472  answer  0.52433866  4  8  ENTER  0.61998665  IN  0.76850295) 

(5643  expense  0.53576285  17  129  CENT  0.724042  STREETS  0.9743345) 
(5643  his  0.20793499  1  452  HIS  0.20793499  SICK  0.9836275) 

(0  he  0  1  301  nil  0  EAT  0.9774895) 

(4104  street  0.35360527  42  226  TRUCK  0.6964248  ROCK  0.7983915) 
(3420  needed  0.09695434  7  510  DAMP  0.33837032  LIE  0.880581) 

(4275  be  0.55266  1  23  BE  0.55266  HATE  0.8016355) 

(0  elegant  0  1  1276  nil  0  BIKE  0.8145315) 

(0  or  0  1  778  nil  0  RAW  0.55914503) 

(3933  from  0.71010053  1  3  FROM  0.71010053  NUMB  0.835498) 

(3249  answer  0.6668777  1  16  ANSWER  0.6668777  SEA  0.908997) 
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(0  customer  0  1  1636  nil  0  US  0.882981) 

(4446  spilled  0.728777  3  25  PILLS  0.9781647  PILLS  0.9781647) 

(1881  some  0.128445  8  320  ACHES  0.5121035  ILL  0.998098) 

(4617  please  0.30018133  12  310  LEAST  0.8270085  LEAST  0.8270085) 
(3420  dig  0.133525  12  369  DOWN  0.20146699  STICK  0.72370434) 

(0  my  0  1  653  nil  0  TIN  0.8300905) 

(2736  ride  0.680566  6  13  RUN  0.8325457  RUN  0.8325457) 

(1197  in  0.477701  4  5  EAR  0.49816298  EAR  0.49816298) 

(5643  cat  0.7185525  4  34  AIR  0.997844  AIR  0.997844) 

(3249  my  0.986694  1  1  MY  0.986694  MY  0.986694) 

(3078  by  0.114378  14  534  ARC  0.47910097  AN  0.8361265) 

(5130  of  0.017055  3  1113  VAIN  0.232869  KNEE  0.989124) 

(4959  may  0.6759225  1  1  MATE  0.6759225  MATE  0.6759225) 

(2565  money  0.48166335  29  88  MY  0.9084255  BUY  0.976346) 

(4446  by  0.889765  1  2  BUY  0.889765  DOWN  0.9772785) 

(0  hard  0  1  2162  nil  0  ARE  0.814149) 

(3933  candy  0.6633402  5  14  CAN  0.7901676  DARNED  0.81916165) 
(4446  every  0.659197  1  1  EVERY  0.659197  EVERY  0.659197) 

(1539  related  0.2635196  32  720  ROCKETS  0.6214252  LET  0.972278) 
(4104  my  0.986151  1  1  MY  0.986151  MY  0.986151) 

(3762  ability  0.37150103  21  215  BELT  0.72267103  IT  0.905287) 

(4104  is  0.998903  1  1  IS  0.998903  IS  0.998903) 

(0  of  0  1  554  nil  0  DISH  0.63980263) 

(0  and  0  1  567  nil  0  EAT  0.9210725) 

(3762  prevented  0.1 136724  25  882  PRETTY  0.5491807  KNEE  0.721 152) 
(0  from  0  1  695  nil  0  AM  0.930971) 

(0  contains  0  1  1909  nil  0  ATE  0.966529) 

(4788  for  0.06732967  7  318  UP  0.13735  SPREAD  0.49306765) 

(684  all  0.093182  34  218  ITSELF  0.304359  PA  0.4298955) 

(3078  he  0.9281375  1  1  HE  0.9281375  HE  0.9281375) 

(4617  lie  0.439317  3  46 1~D  0.6979505  ERRORS  0.92038) 

(2394  all  0.328298  3  72  ILL  0.3343845  DAY  0.7741255) 

(0  once  0  1  1470  nil  0  EYES  0.85753703) 

(2907  cory  0.86084336  3  6  KEY  0.9065755  THE  0.9458435) 

(2907  and  0.1449045  22  385  CALF  0.5343795  TEA  0.918062) 

(4617  played  0.7413783  10  42  LAND  0.8472547  AN  0.983235) 

(0  for  0  1  944  nil  0  SEND  0.5886887) 

(4275  right  0.7067585  2  14 1~LL  0.8422365  CRY  0.954267) 

(4446  may  0.993221  1  1  MAY  0.993221  MAY  0.993221) 

(2907  be  0.069638  3  390  BEING  0.29579452  AM  0.938026) 

(0  for  0  1  507  nil  0  MAY  0.955057) 

(2736  business  0.69174397  1  16  BUSINESS  0.69174397  SEAM  0.894282) 
(1881  are  0.26907802  5  46  HE  0.499338  RIDGE  0.598608) 

(4275  for  0.96028304  1  2  OR  0.96028304  OR  0.984758) 

(4617  big  0.81938696  1  3  BIG  0.81938696  TEA  0.89177704) 


297 


(1881  ambled  0.6312055  6  24  CAMP  0.896365  MAP  0.9428665) 

(3591  increases  0.038939  19  1522  INCREASE  0.6343647  EASE  0.99570656) 
(6840  and  0.532991  1  38  AN  0.532991  STRAINED  0.971254) 

(1539  sprained  0.79859704  3  3  PLAN  0.80701596  PLAN  0.80701596) 

(4446  steep  0.7146327  2  4  TEA  0.87518597  TEA  0.90740097) 

(513  can  0.534908  5  28  CAN~T  0.6848127  ART  0.89345896) 

(3420  be  0.988761  1  1  BE  0.988761  BE  0.988761) 

(3249  diseases  0.48845783  1  1 13  DISEASES  0.48845783  EASE  0.935228) 
(5985  are  0.447175  1  10  ARE  0.447175  OF  0.977253) 

(855  in  0.266992  7  254  LESS  0.5209915  SEA  0.986476) 

(1881  are  0.47406  5  25  HE  0.65294147  THE  0.86414003) 

(4446  for  0.3037985  14  163  ON  0.499585 1~M  0.731047) 

(4788  big  0.9806285  1  1  BIG  0.9806285  BIG  0.9806285) 

(4617  ambled  0.6953997  4  9  AM  0.937151  AM  0.937151) 

(0  increases  0  1  1888  nil  0  TRAY  0.9058485) 

(1881  is  0.068122  9  255  WRIST  0.514683  TEA  0.73410904) 

(0  controlled  0  1  1398  nil  0  ADD  0.925835) 

(171  from  0.38913667  6  27  RUN  0.678268  THE  0.986308) 

(2052  is  0.186455  14  199  IT  0.43510848  NEAT  0.78260696) 

(3933  for  0.3328665  6  41  OFTEN  0.5137785  EAT  0.560658) 

(4275  buyer  0.8374985  1  1  BUYER  0.8374985  BUYER  0.8374985) 

(3762  money  0.259215  1  372  MONEY  0.259215  ARC  0.80851996) 

(0  for  0  1  609  nil  0  GLEE  0.950649) 

(0  combine  0  1  1392  nil  0  BUY  0.741635) 

(1539  all  0.209123  14  95  LOCK  0.45454934  BUY  0.924078) 

(3078  in  0.4273715  6  29  ILL  0.679192  ERA  0.9314735) 

(2907  by  0.997186  1  1  BUY  0.997186  BUY  0.997186) 

(2052  and  0.1496175  38  309  ANT  0.5470895  ILLNESS  0.7579887) 

(2736  stray  0.93600553  1  1  STRAY  0.93600553  STRAY  0.93600553) 

(4617  for  0.2175155  7  279  ALL  0.6265375  ARE  0.990799) 

(4959  one  0.28297368  2  174  KNICK-  0.30177101  LID  0.6083893) 

(0  of  0  1  640  nil  0  TEA  0.580014) 

(4275  buy  0.959834  1  1  BUY  0.959834  BUY  0.959834) 

(3420  be  0.452547  11  67  ILL  0.73377645  KNEE  0.9064575) 

(4788  right  0.566736  4  40 1-D  0.6067855  AN  0.873578) 

(3249  hand  0.083348334  9  783  HE  0.3612895  ALL  0.978445) 

(3762  are  0.186133  1  255  ARE  0.186133  QUICK  0.84097964) 

(5472  all  0.984129  1  1  ALL  0.984129  ALL  0.984129) 

(2223  errors  0.81481236  2  5  EVER  0.8298127  ALL  0.927791) 

(2736  nearest  0.47288325  7  42  KNEE  0.6990895  ERRORS  0.9959445) 

(3933  may  0.457906  2  29  MAIN  0.476056  ARE  0.81 1062) 

(3078  be  0.160617  6  246  EAT  0.56454754  EAT  0.9635745) 

(0  distance  0  1  1416  nil  0  DEVI  0.895919) 

(2907  buys  0.95487654  4  5  BUY  0.996306  BUY  0.996306) 

(0  catastrophic  0  1  2762  nil  0  TEA  0.99057245) 
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(3762  expense  0.6828584  1  34  EXPENSE  0.6828584  BE  0.997291) 

(3591  his  0.010188666  12  579  EASE  0.033178  LIST  0.829572) 

(0  he  0  1  643  nil  0  ATE  0.971926) 

(0  street  0  1  982  nil  0  LOT  0.979736) 

(3078  needed  0.7002863  3  12  KNEE  0.982007  KNEE  0.982007) 

(2565  be  0.6845275  1  8  BE  0.6845275  CAN  0.813199) 

(0  elegant  0  1  1379  nil  0  ELM  0.9427315) 

(5472  or  0.210486  1  82  OR  0.210486  ELM  0.769936) 

(4788  from  0.19615467  2  268  RUN  0.4887615  TEA  0.8771955) 

(2394  answer  0.9052615  8  9  AN  0.9781835  AN  0.9781835) 

(4275  are  0.833263  2  5  KNEE  0.942628  KNEE  0.942628) 

(1 197  for  0.462328  1  39  OR  0.462328  RIM  0.81242466) 

(4617  big  0.6156905  2  7  IT  0.77520704  TELL  0.8289505) 

(3591  ambled  0.642177  7  9  AIR  0.868595  AIR  0.868595) 

(3591  increases  0.45626396  8  385  LET~S  0.6247363  NEAT  0.99631846) 
(4617  for  0.040271334  6  575  REGARD  0.09105801  OF  0.93189) 

(6498  all  0.242794  2  268  LAWS  0.5888225  OF  0.973385) 

(3078  he  0.7721135  1  1  HE  0.7721135  HE  0.7721135) 

(4275  lie  0.975584  1  1  LIE  0.975584  LIE  0.975584) 

(4617  all  0.789721  1  2  ALL  0.789721  DOLL  0.7901865) 

(0  once  0  1  1216  nil  0  ARE  0.970461) 

(2907  cory  0.99151397  2  3  KEY  0.9985715  KEY  0.9985715) 

(2223  and  0.233779  12  185  CAN-T  0.574616  OR  0.99344) 

(4617  played  0.41 1271  18  170  PLANE  0.68859464  AIN~T  0.8 110955) 
(4104  for  0.0700685  5  275  RIDE  0.14295234  PA  0.5229645) 

(2907  right  0.38094762  33  173  RON~S  0.80220866  THE  0.988127) 
(4104  may  0.997625  1  1  MAY  0.997625  MAY  0.997625) 

(4104  be  0.996968  1  1  BE  0.996968  BE  0.996968) 

(3933  for  0.021034332  7  630  ROB  0.20541  AM  0.779864) 

(0  business  0  1  1064  nil  0  ARE  0.910798) 

(0  for  0  1  675  nil  0  OLE  0.7221335) 

(5985  all  0.741487  1  2  ALL  0.741487  RAW  0.81 1 157) 

(3078  he  0.8284555  1  1  HE  0.8284555  HE  0.8284555) 

(5130  lie  0.987753  1  1  LIE  0.987753  LIE  0.987753) 

(7524  all  0.4718325  1  18  ALL  0.4718325  OF  0.884686) 

(0  once  0  1  1398  nil  0  IN  0.928251) 

(513  cory  0.97680765  1  3  CORY  0.97680765  OR  0.998108) 

(2736  and  0.530195  6  62  CALF  0.6881885  OR  0.989617) 

(3078  played  0.40020248  20  233  PAY  0.93032646  ATE  0.992266) 

(4788  for  0.26758868  5  98  RISE  0.30035168  ALL  0.94527) 

(2907  right  0.483153  9  95  ROCK  0.75791806  MAY  0.966908) 

(2907  may  0.541507  12  91  MAIN  0.6819634  LAY  0.97726905) 

(2907  be  0.781289  1  1  BE  0.781289  BE  0.781289) 

(4446  for  0.06601467  7  410  OF  0.764737  OF  0.764737) 

(0  business  0  1  973  nil  0  OF  0.868821) 
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(3762  are  0.039052  9  399  ROAD  0.0643245  SAW  0.502217) 

(7182  all  0.393565  2  42  KNEE  0.6040865  OF  0.907752) 

(0  errore  0  1  966  nil  0  KNEES  0.692177) 

(4617  nearest  0.66095966  8  13  EAR  0.9452915  EAR  0.9452915) 
(4104  may  0.423365  1  26  MAY  0.423365  MY  0.989461) 

(0  be  0  1  648  nil  0  ACE  0.8981545) 

(0  distance  0  1  1394  nil  0  IS  0.961375) 

(2907  buys  0.6137945  7  7  BUY  0.915691  BUY  0.915691) 

(0  catastrophic  0  1  3297  nil  0  REST  0.897919) 

(2736  call  0.976327  1  1  CALL  0.976327  CALL  0.976327) 

(513  an  0.313829  31  130  ALL  0.99931645  ALL  0.99931645) 

(0  for  0  1  599  nil  0  MEN  0.724507) 

(4959  can  0.942652  1  1  AN  0.942652  AN  0.942652) 

(3933  buy  0.009188  7  779  BUYING  0.500659  LYING  0.955875) 
(3420  mine  0.0843125  12  676 1~LL  0.4023795  IN  0.741367) 

(3420  planned  0.41799632  30  140  LAY  0.8991925  LAY  0.8991925) 
(342  is  0.570024  2  18  IN  0.6154745  OR  0.882951) 

(4617  every  0.9407805  2  2  EVER  0.9578305  EVER  0.9578305) 
(3933  related  0.21486725  8  573  ILL  0.52027947  KNICK-  0.874421) 
(3249  my  0.355187  2  31  MINE  0.51 1854  DOLL  0.7299915) 

(2907  ability  0.8247472  1  12  ABILITY  0.8247472  ILL  0.994556) 
(3933  is  0.918467  1  1  IS  0.918467  IS  0.918467) 

(3762  and  0.474731  1  180  AND  0.474731  ARE  0.921025) 

(4104  are  0.748858  1  5  ARE  0.748858  STAR  0.857826) 

(4617  spend  0.521 13867  8  89  PAN  0.9779655  PAN  0.9779655) 
(2394  in  0.35642648  2  47  IS  0.4397125  AN  0.895222) 

(3591  miami  0.61820334  1  41  MIAMI  0.61820334  AND  0.969271) 
(2736  is  0.038666  36  641  DISH  0.48110098  SPHERE  0.6750593) 
(3078  be  0.037473  6  414  EAT  0.3521325  KNEE  0.946342) 

(5814  need  0.084226  5  628  LEAF  0.4095915  KNICK-  0.9554285) 
(4617  for  0.1 16187505  9  359  ALL  0.4697865  SAW  0.7748395) 
(3762  and  0.124147  8  292  AIR  0.42909652  RAY  0.785141) 

(2565  in  0.491492  1  7  IN  0.491492  SAY  0.5981785) 

(3249  big  0.32406166  5  85  BE  0.53815997  BUY  0.978667) 

(4788  can  0.9533145  1  1  AN  0.9533145  AN  0.9533145) 

(3933  be  0.135168  2  363  ELL  0.4198905  AN  0.9764135) 

(2052  an  0.485607  1  12  AN  0.485607  IN  0.5858715) 

(4446  sport  0.6145024  4  31  PA  0.9179115  KNEE  0.997736) 

(0  for  0  1  720  nil  0  RIDE  0.656766) 

(0  pretty  0  1  1555  nil  0  TEA  0.97172654) 

(684  in  0.1 17268  17  144  WICK  0.523299  KNEE  0.705815) 

(3249  dance  0.37358502  13  255  TAN  0.5558273  SET  0.9965135) 
(3591  and  0.013089667  3  301  SALT  0.014706001  EARS  0.7093715) 
(3249  dislikes  0.82079715  5  36  DATE  0.8628125  ILL  0.994719) 
(4446  for  0.496189  5  61  RAW  0.5017405  OR  0.963705) 
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(0  pretty  0  1  1655  nil  0  RAKED  0.9972455) 

(684  in  0.5634175  1  4  IN  0.5634175  OWN  0.8436115) 

(0  dance  0  1  1606  nil  0  ACT  0.77698195) 

(4104  and  0.17320566  1  186  AND  0.17320566  ACT  0.871338) 

(3249  dislikes  0.64105636  9  105  DEAR  0.85585195  IN  0.9805435) 

(5472  for  0.4007265  1  1  OR  0.4007265  OR  0.4007265) 

(0  pretty  0  1  1529  nil  0  AM  0.991 1685) 

(2223  in  0.43675947  1  6  IN  0.43675947  RAIN  0.5242203) 

(2736  dance  0.41422665  4  98  CAN  0.46373233  AM  0.709285) 

(1368  and  0.042073  22  271  AT  0.49658298  IT  0.7090875) 

(3591  dislikes  0.75019646  3  26  DISK  0.832177  LIKE  0.96909547) 

(4446  of  0.006158  10  419  AIN~T  0.62361467  KNIT  0.7231297) 

(342  and  0.095333666  26  384  LIGHT  0.6205415  ATE  0.93550396) 

(3420  prevented  0.31241018  21  500  RUN  0.745318  IN  0.8815105) 

(4275  from  0.310694  3  131  FRAME  0.49171934  TEA  0.949278) 

(0  contains  0  1  1913  nil  0  AN  0.812647) 

(171  is  0.0979255  9  320  IT  0.457066  STATUS  0.7587625) 

(0  controlled  0  1  1865  nil  0  ROCK  0.9027503) 

(0  from  0  1  928  nil  0  THE  0.9741485) 

(1881  is  0.546827  1  12  EARS  0.546827  ROSE  0.7186857) 

(2565  for  0.21 192399  13  200  FOSTERS  0.571049  SCHOLARS  0.7843865) 
(3762  buyer  0.9712105  1  1  BUYER  0.9712105  BUYER  0.9712105) 

(3762  money  0.9487015  1  2  MONEY  0.9487015  KNEE  0.997469) 

(0  for  0  1  488  nil  0  KNEE  0.994122) 

(2907  is  0.5204265  3  6  IT  0.525493  TRUCK  0.582838) 

(0  controlled  0  1  3133  nil  0  ILL  0.8999145) 

(6156  from  0.5190243  5  59  RUN  0.8244155  SON  0.94738495) 

(2907  is  0.7745025  1  1  IS  0.7745025  IS  0.7745025) 

(3591  for  0.971299  1  3  OR  0.971299  HE  0.98626304) 

(6156  buyer  0.9327045  4  4  BUY  0.993053  BUY  0.993053) 

(3762  money  0.70696163  1  27  MONEY  0.70696163  KNEE  0.995953) 
(5643  for  0.195388  4  153  RUST  0.33259448  US  0.897539) 

(1710  are  0.3504355  1  86  ARE  0.3504355  WE  0.9162665) 

(2394  or  0.21038951  4  55  ON  0.290086  IN  0.4052385) 

(0  proceeding  0  1  1681  nil  0  SEA  0.995776) 

(3591  grows  0.5214253  2  8  REST  0.69385195  REST  0.694663) 

(0  of  0  1  568  nil  0  NUTS  0.64122397) 

(2394  and  0.290285  15  158  IF  0.6820545  KNEE  0.72044504) 

(2223  prevented  0.3 1 159735  23  142  PRINT  0.6367 1 15  TIN  0.87030554) 
(3933  from  0.0714715  5  330  RIM  0.298973  TEN  0.660129) 

(0  contains  0  1  2337  nil  0  TEA  0.9073005) 

(4959  of  0.0348455  2  441  URGING  0.33254933  AL  0.5631 1 15) 

(3249  and  0.089605  19  460  RACK  0.5084225  ALL  0.847959) 

(3591  prevented  0.10798683  17  683  MINUTE  0.27276176  CAN  0.55607) 
(3762  from  0.007928333  15  524  AL  0.363823  ILL  0.662272) 
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(0  contains  0  1  2179  nil  0  ARC  0.8166905) 

(0  retracted  0  1  2392  nil  0  CAT  0.862888) 

(3249  an  0.4710205  2  40  AM  0.4973665  IN  0.892914) 

(3591  michael  0.90742755  2  2  MY  0.962333  MY  0.962333) 

(6156  of  0.14661  1  158  OF  0.14661  ACT  0.81954503) 

(3591  is  0.958272  1  2  IS  0.958272  KNEES  0.9982645) 

(4446  big  0.4738925  4  59  IN  0.5749535  KNIT  0.96743095) 

(2907  in  0.70157254  3  12  ILL  0.853472  US  0.981916) 

(3762  are  0.632435  4  18  ARM  0.7865765  KNOCKED  0.869299) 

(4788  for  0.755607  1  7  OR  0.755607  ARE  0.993783) 

(6156  quick  0.40744534  3  109  CALL  0.476751  ARE  0.969373) 

(1881  in  0.267763  3  32  SWIG  0.32640833  DIG  0.934959) 

(3078  in  0.584768  1  22  IN  0.584768  ARE  0.997551) 

(0  consists  0  1  2455  nil  0  TEA  0.967476) 

(3762  of  0.137881  6  194  DIM  0.144742  BUY  0.848095) 

(4959  and  0.35286567  2  141  DELIGHT  0.48898974  LIE  0.92606) 

(3591  are  0.17587  3  118  LOCK  0.19160901  PA  0.69188154) 

(1881  are  0.164621  9  280  CAR  0.52196753  HE  0.8669315) 

(4104  for  0.6777485  2  14  FORM  0.7003226  LIE  0.989879) 

(4104  big  0.602522  5  19  IN  0.95049596  IN  0.95049596) 

(4788  ambled  0.67039937  4  15  AM  0.987842  AM  0.987842) 

(0  increases  0  1  1849  nil  0  SEA  0.997177) 

(0  retracted  0  1  1989  nil  0  PARK  0.82062) 

(2736  an  0.114817  8  182  AM  0.5384545  KEYED  0.8746815) 

(3591  michael  0.612689  3  20 1~LL  0.617528  HE~LL  0.8983865) 

(3078  of  0.16344  1  212  OF  0.16344  IT  0.621655) 

(684  is  0.570161  3  8  AM  0.626615  EASE  0.81445396) 

(2565  big  0. 182936  30  193  COUNT  0.534191  ILL  0.717885) 

(3420  in  0.1561875  2  67  ILL  0.180489  US  0.508708) 

(0  retracted  0  1  2646  nil  0  READ~V_PRES  0.826543) 

(1197  an  0.193885  11  240  AM  0.51815146  EAT  0.837273) 

(3249  michael  0.98393154  1  1  MICHAEL  0.98393154  MICHAEL  0.98393154) 
(2907  of  0.0856325  2  312  THE  0.099802  ALL  0.877445) 

(2907  is  0.4241225  1  16  IS  0.4241225  NEED  0.803488) 

(4446  big  0.4643455  12  38  IT  0.7826005  IT  0.7826005) 

(4446  in  0.5072425  7  37  ILL  0.7653475  US  0.99567) 

(2907  customer  0.2706992  13  654  COURSE  0.33289066  LEAP  0.9418515) 
(4788  spilled  0.625782  2  12  PILL  0.84381104  DISK  0.8931926) 

(2223  some  0.5282273  2  16  SLID  0.529021  ELM  0.7957475) 

(1368  please  0.447996  12  54  PLACE  0.65568  LAST  0.74928933) 

(3249  dig  0.088084  8  721  DECK  0.51 1532  IN  0.939704) 

(0  my  0  1  977  nil  0  IN  0.7254715) 

(0  ride  0  1  1227  nil  0  BE  0.905133) 

(6669  in  0.043132503  6  273  STICK  0.181892  BEEP  0.88983846) 

(3933  made  0.165121  8  461  MAKE  0.5227025  AM  0.892793) 
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(3420  of  0.397391  1  18  OF  0.397391  AM  0.678249) 

(3762  are  0.639015  2  8  ARM  0.77709  BE  0.995787) 

(3078  in  0.6274035  2  12  IT  0.664878  TEN  0.7760933) 

(0  stockings  0  1  2202  nil  0  CEASE  0.9758805) 

(2907  run  0.29501566  26  248  READ-V_PAST  0.6540343  LIE  0.970846) 
(3591  dishes  0.5675715  2  48  DENSE  0.600734  IS  0.975152) 

(4788  makes  0.948242  1  4  MAKES  0.948242  SEA  0.99613) 

(171  and  0.15636049  16  338  NEXT  0.464755  TEXT  0.89443) 

(3762  made  0.6140315  7  9  MAY  0.98558  MAY  0.98558) 

(6327  of  0.074383  1  314  OF  0.074383  I-M  0.56355697) 

(3078  are  0.1874045  12  248  ROAM  0.4428005  PUMPED  0.53283876) 
(5301  in  0.929425  1  1  IN  0.929425  IN  0.929425) 

(0  stockings  0  1  2456  nil  0  BOX  0.8561 12) 

(4275  run  0.366367  4  192  OWN  0.49622452  AM  0.848271) 

(5472  dishes  0.660367  1  8  DISHES  0.660367  HE  0.73520553) 

(5301  makes  0.9825553  4  4  ACHES  0.993424  ACHES  0.993424) 

(5985  and  0.1184175  5  381  KNOCKED  0.21454935  ARM  0.5451945) 
(4617  for  0.27693734  2  85  RUN  0.501332  OF  0.956916) 

(5130  pretty  0.6544657  4  45  READ~V_PRES  0.84816265  TEA  0.9867315) 
(855  in  0.603928  3  20  IT  0.6808205  KNEE  0.95469) 

(3249  dance  0.63984966  7  40  TEA  0.7941755  AN  0.96578455) 

(684  and  0.1324185  32  292  ANT  0.586987  ACT  0.9515295) 

(0  dislikes  0  1  1664  nil  0  ILL  0.9601395) 

(4446  expense  0.7596216  5  13  SENSE  0.967382  SENSE  0.967382) 

(4959  his  0.08355566  2  301  IS  0.10708  WAS  0.567628) 

(2907  he  0.749623  2  9  HECK  0.760419  LESS  0.788133) 

(2736  street  0.8731268  8  22  SAY  0.988127  ATE  0.990252) 

(3762  needed  0.67678374  3  18  KNEE  0.827742  TIM  0.9020286) 

(3933  be  0.95599  1  1  BE  0.95599  BE  0.95599) 

(0  elegant  0  1  1414  nil  0  IT  0.77017) 

(342  or  0.174021  9  161  ARM  0.5082715  RAW  0.59813) 

(5472  from  0.95581 15  1  3  FROM  0.95581 15  SOME  0.96671903) 

(2565  answer  0.7580135  8  14  AN  0.882728  AN  0.882728) 

(0  forces  0  1  1315  nil  0  OR  0.987609) 

(3933  from  0.073127  32  442  TRY  0.480334  SHE  0.978408) 

(3249  lily  0.2679435  6  201  IN  0.5128795  ARE  0.851787) 

(4104  are  0.014656001  12  522  ODD  0.36625248  ACT  0.830106) 

(3591  of  0.31546  3  118  SAY  0.423462  RAY  0.878229) 

(1026  come  0.77633595  1  2  COME  0.77633595  DUMB  0.858336) 

(3933  makes  0.65662235  6  14  ACHES  0.9655995  ACHES  0.9655995) 

(0  combine  0  1  1 1 15  nil  0  BUY  0.994382) 

(4959  all  0.958342  1  2  ALL  0.958342  LAW  0.97528946) 

(3078  in  0.0237655  10  316  DLL  0.5043615  LIE  0.9072515) 

(4104  by  0.964935  2  4 1~M  0.979793  SOME  0.9947105) 

(4959  and  0.29446033  4  377  LED  0.433824  KICK  0.8618355) 
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(4275  stray  0.98699194  1  1  TRAY  0.98699194  TRAY  0.98699194) 

(1368  for  0.30986765  16  120  ARE  0.997043  ARE  0.997043) 

(5643  one  0.43633166  4  78  KNICK-  0.6675827  ON  0.8909795) 

(3591  of  0.040011  5  235  DID  0.13183634  ROCK  0.6789573) 

(4446  buy  0.956038  1  2  BUY  0.956038  ARE  0.977779) 

(3591  be  0.742095  1  4  BE  0.742095  EASE  0.999831) 

(4446  right  0.28837752  17  220  RAY  0.490272  LACK  0.885265) 

(6498  hand  0.42817  9  1 12  AN  0.996281  AN  0.996281) 

(4959  every  0.9990295  1  1  EVERY  0.9990295  EVERY  0.9990295) 

(2907  related  0.44909462  21  133  REAL  0.76389736  LAY  0.984313) 
(3933  my  0.926636  1  1  MY  0.926636  MY  0.926636) 

(2394  ability  0.6061372  3  59  SALT  0.642286  TEA  0.974556) 

(4446  is  0.98424  1  1  IS  0.98424  IS  0.98424) 

(4617  cat  0.591491  9  49  AL  0.9338845  THEN  0.99337304) 

(2565  my  0.4540415  10  66  MONEY  0.8731494  SON  0.92857754) 

(3591  by  0.143182  23  367 1~D  0.30190098  PA  0.874269) 

(3933  of  0.006348  16  944  VAIN  0.06384  RAW  0.644332) 

(3762  may  0.771679  1  3  MAY  0.771679  KNEE  0.9851785) 

(2736  money  0.781385  1  7  MONEY  0.781385  OR  0.950433) 

(2907  by  0.727661  1  12  BUY  0.727661  ARM  0.94702697) 

(0  hard  0  1  1379  nil  0  ARE  0.991848) 

(4617  candy  0.68516165  3  13  AN  0.97107553  KNEE  0.988752) 

(2736  did  0.2997867  4  87  DEEP  0.611251  BUY  0.979622) 

(3933  buy  0.993957  1  1  BUY  0.993957  BUY  0.993957) 

(5130  any  0.5710857  4  45  KNEE  0.916275  BUY  0.988418) 

(4275  bright  0.7041 1235  1  6  RIGHT  0.7041 1235  ARM  0.975189) 

(0  red  0  1  816  nil  0  ROCK  0.75791 144) 

(3420  his  0.024862334  13  459  STACKED  0.32552403  PECK  0.7506515) 
(5472  for  0.22345366  1  149  DOOR  0.22345366  ANT  0.6002915) 

(3762  is  0.316191  1  126  IS  0.316191  SIT  0.784167) 

(4446  am  0.886832  1  1  AM  0.886832  AM  0.886832) 

(1026  and  0.041617002  14  238  ANT  0.354968  WAS  0.901433) 

(5472  are  0.2018305  1  84  ARE  0.2018305  OF  0.830437) 

(1710  are  0.5438975  7  32  HE  0.961 1795  HE  0.961 1795) 

(0  or  0  1  701  nil  0  ILL  0.976047) 

(0  proceeding  0  1  1944  nil  0  US  0.979233) 

(2736  grows  0.32468253  9  107  GRAB  0.61099535  ERA  0.78116155) 
(3420  made  0.20508951  11  352  CRISS-  0.4166005  TEA  0.8238725) 
(4275  of  0.010019  9  498  QUICK  0.061544668  NEAT  0.891552) 

(2394  are  0.414552  3  34  LOCK  0.56370246  BE  0.978928) 

(2736  in  0.675133  5  6  IT  0.745776  IT  0.745776) 

(3078  stockings  0.62385064  17  74  STAY  0.89382404  ROCK  0.972577) 
(3249  run  0.39825597  28  193  RICE  0.8414853  ICE  0.9786635) 

(3762  dishes  0.3910802  5  327  DEANS  0.48454103  REST  0.7681855) 
(2736  makes  0.6982025  6  46  AM  0.959641  SEA  0.969239) 
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(0  and  0  1  469  nil  0  TEA  0.992323) 

(4104  are  0.35005  3  32  ROCK  0.37062252  IS  0.997174) 

(4104  for  0.7095225  5  7  OUGHT  0.8044215  OUGHT  0.8044215) 

(4104  quick  0.30201 1  3  100  KEY  0.461446  SING  0.84925497) 

(2223  in  0.258143  1  18  IN  0.258143  PAW  0.39657003) 

(3249  in  0.1890435  2  74  WITH  0.20109701  BUY  0.8049575) 

(0  consists  0  1  1297  nil  0  HST  0.99154997) 

(0  of  0  1  438  nil  0  SIT  0.9704745) 

(4275  and  0.19544698  4  154  DEAR  0.25310698  SING  0.88849497) 
(5814  are  0.0049765  3  419  ART  0.0345735  PAW  0.694688) 

(2907  call  0.875729  1  3  CALL  0.875729  ON  0.968277) 

(171  an  0.548427  1  22  AN  0.548427  ARE  0.98519) 

(4275  for  0.2557685  6  172  ALL  0.651551  FROM  0.900974) 

(4788  can  0.9009615  1  1  AN  0.9009615  AN  0.9009615) 

(4275  buy  0.710919  1  2  BUY  0.710919  OR  0.857033) 

(4104  mine  0.9498195  2  3  MY  0.960671  ARE  0.988213) 

(3762  planned  0.637724  9  62  LIKE  0.861 16266  CRACK  0.968679) 
(2736  is  0.4788475  1  13  IS  0.4788475  EASE  0.78007746) 

(3420  expense  0.5947454  3  104  EXPECT  0.6229712  IT  0.914091) 

(4788  his  0.13221733  3  285  IS  0.276094  ACE  0.7157975) 

(0  he  0  1  908  nil  0  ERA  0.8466505) 

(0  street  0  1  861  nil  0  WICK  0.8521465) 

(0  needed  0  1  1 124  nil  0  AND  0.96709) 

(4275  be  0.025206  6  587  BEEP  0.1420555  TEA  0.9724725) 

(0  elegant  0  1  1430  nil  0  AL  0.994593) 

(0  or  0  1  444  nil  0  LIE  0.90619) 

(4617  from  0.6088783  3  20  RUN  0.90785754  RUN  0.90785754) 

(5301  answer  0.54885435  3  45  ANT  0.64589846  ARC  0.7649235) 

(2907  he  0.6483935  1  6  HE  0.6483935  QUICK  0.77675563) 

(4617  from  0.6272177  2  1 1  RUN  0.649046  THE  0.7917725) 

(4275  every  0.42015398  2  33  EVER  0.42262998  KNEE  0.7668975) 

(1 197  in  0.4782535  4  54  IT  0.543578  ARE  0.992748) 

(3762  of  0.004275  4  580  EVER  0.3160905  US  0.8969605) 

(5130  cleaned  0.41090965  3  170  CLEANS  0.5567297  TEA  0.83958554) 
(4104  for  0.8555175  1  2  OR  0.8555175  SCORE  0.8651135) 

(0  combine  0  1  879  nil  0  MAIN  0.8191 1 135) 

(4446  all  0.373455  2  77  DOLL  0.42675567  MAY  0.94514) 

(3078  in  0.020505  26  323  ILL  0.506971  TEA  0.583995) 

(4275  by  0.039996  32  665  MAY  0.616976  MY  0.941262) 

(3078  and  0.161797  20  438  KNEE  0.4649435  NEAT  0.9031915) 

(3591  stray  0.905693  4  7  TRAIN  0.9265587  RAIN  0.9362985) 

(513  for  0.35730067  3  71  ORIGIN  0.53328705  ARE  0.996502) 

(3078  one  0.581821  7  20  WE  0.865209  WE  0.865209) 

(3762  of  0.210603  3  128  RUN  0.41099  RAW  0.79410946) 

(2907  buy  0.968355  2  4 1~LL  0.9695155  ARE  0.9896115) 
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(3762  be  0.965341  1  2  BE  0.965341  ILL  0.967346) 

(4617  right  0.5010265  4  30  RIPE  0.6087665  CRY  0.889336) 

(6156  hand  0.6371227  3  14  AN  0.982571  AN  0.982571) 

(5301  are  0.119466  4  248  ERA  0.28535548  TRAIN  0.6937023) 

(5472  all  0.9261 13  1  1  ALL  0.9261 13  ALL  0.9261 13) 

(4104  errors  0.98899996  1  1  ERRORS  0.98899996  ERRORS  0.98899996) 
(4104  nearest  0.7579281  5  10  IN  0.9966675  IN  0.9966675) 

(3762  may  0.517657  1  8  MAY  0.517657  ARE  0.896779) 

(3078  be  0.194545  1  157  BE  0.194545  ARM  0.641486) 

(3420  distance  0.6459192  11  31  DEAR  0.881236  DEAR  0.881236) 

(2907  buys  0.798103  3  4  BUY  0.962894  BUY  0.962894) 

(0  catastrophic  0  1  3329  nil  0  TEA  0.9020335) 

(1881  are  0.3630765  3  44  ODD  0.739289  THE  0.973252) 

(0  or  0  1  579  nil  0 1~VE  0.96294403) 

(0  proceeding  0  1  1601  nil  0  SEA  0.995391) 

(4446  grows  0.41009966  17  90  ROLE  0.615552  OAK  0.76439154) 

(2565  is  0.8741965  2  2  WICK  0.90106  WICK  0.90106) 

(0  for  0  1  781  nil  0  SKI  0.845081) 

(5301  ended  0.4128945  7  149  DIM  0.6221653  AM  0.93013597) 

(4104  please  0.19960399  17  353  EAT  0.5560055  THE  0.967048) 

(0  cleaners  0  1  1470  nil  0  CLEAN  0.97738004) 

(0  for  0  1  630  nil  0  MAY  0.943681) 

(4275  cleaners  0.43855467  6  89  CLEAN  0.5228985  TRAIN  0.702025) 
(171  and  0.27715  14  231  IN  0.48654452  RUN  0.5500855) 

(5472  are  0.019608  10  352  AID  0.09395  KNEE  0.748146) 

(4275  spend  0.4836217  13  61  SPENT  0.7579193  SPENT  0.7579193) 
(3078  in  0.348754  1  2  IN  0.348754  TIN  0.43380448) 

(3762  miami  0.5164487  1  26  MIAMI  0.5164487  TEA  0.8574625) 

(2736  is  0.6977865  1  1  IS  0.6977865  IS  0.6977865) 

(3078  be  0.993753  1  1  BE  0.993753  BE  0.993753) 

(2565  need  0.340403  4  58  KNEE  0.620178  RUN  0.69343996) 

(701 1  for  0.08312634  7  318  ADD  0.35571003  ILL  0.871887) 

(1026  and  0.0968655  14  612  VAULTING  0.381881  TALL  0.877313) 
(4275  in  0.367938  5  56  IT  0.4351335  KNEES  0.68775797) 

(4446  big  0.681351  1  11  BIG  0.681351  SAY  0.95214105) 

(0  can  0  1  685  nil  0  BE  0.959199) 

(3420  be  0.98315  1  1  BE  0.98315  BE  0.98315) 

(3591  an  0.189566  28  183  AND  0.567212  BE  0.949433) 

(4788  sport  0.51418567  7  18  PACK  0.818434  KNACK  0.8256875) 

(3420  made  0.95283103  1  1  AID  0.95283103  AID  0.95283103) 

(3591  of  0.027146  15  407  ACT  0.08245  MAKE  0.9187205) 

(3078  are  0.953252  1  1  ARE  0.953252  ARE  0.953252) 

(171  in  0.562705  1  8  IN  0.562705  THE  0.7023735) 

(0  stockings  0  1  2465  nil  0  STOCK  0.995908) 

(4104  run  0.913673  1  1  RUN  0.913673  RUN  0.913673) 


306 


(2052  dishes  0.40857852  2  398  DISK  0.56346303  SOME  0.98439) 

(4104  makes  0.59021235  15  46  ATE  0.934686  ATE  0.988361) 

(0  and  0  1  340  nil  0  KNEE  0.8555905) 

(171  he  0.9172605  1  1  HE  0.9172605  HE  0.9172605) 

(4617  from  0.331668  2  224  FRAME  0.336658  ATE  0.93441 1) 

(3762  every  0.49218902  2  18  KNEE  0.5030065  EASE  0.935871) 

(1368  in  0.3063505  10  82  IT  0.4977535  AT  0.6301 11) 

(3762  of  0.0130225  1  499  OF  0.0130225  ATE  0.74061203) 

(3762  cleans  0.38865525  19  196  KEY  0.81421 196  KEY  0.81421 196) 

(3933  for  0.419846  1  48  OR  0.419846  FISH  0.956377) 

(2565  customer  0.696346  1  24  CUSTOMER  0.696346  OF  0.921 176) 

(4617  spilled  0.7269047  1  11  SPILLED  0.7269047  ELSE  0.92272496) 

(1539  some  0.034870334  12  452  OF  0.4611685  SEX  0.986068) 

(4275  please  0.20984267  19  470  LEAST  0.780599  LEAST  0.82355565) 

(0  dig  0  1  897  nil  0  AIR  0.7668725) 

(1710  my  0.196981  16  553  MARKETS  0.5662316  ARE  0.963941) 

(4446  ride  0.4139885  8  271  ART  0.5896623  ATE  0.924378) 

(4446  in  0.300778  1  83  IN  0.300778  EAT  0.6923665) 

(4275  forces  0.6008395  15  68  OFFER  0.99699247  OFFER  0.99699247) 

(4104  from  0.45025602  11  31  REAL  0.7079243  IS  0.775393) 

(855  lily  0.67164%3  1  12  LILY  0.67164963  LEAD  0.9863795) 

(171  are  0.631936  3  23  ARC  0.79821  CAT  0.85508454) 

(3420  of  0.025864  5  643  VAULT  0.105037995  BUY  0.85267) 

(2736  come  0.4187875  10  80  KEY  0.876138  TEA  0.94527197) 

(2565  makes  0.554786  7  32  AM  0.982953  AM  0.982953) 

(2736  cat  0.79937667  4  1 1  CAN  0.8466533  BE  0.955984) 

(3249  my  0.05082  36  517  MINE  0.44738498  KNEE  0.84395) 

(3078  by  0.3191 13  6  62 1~VE  0.63817847  IS  0.7278055) 

(3591  of  0.224802  2  210  LAMPS  0.298864  ANT  0.777408) 

(3420  may  0.994372  1  1  MAY  0.994372  MAY  0.994372) 

(2565  money  0.6219138  12  37  MAIN  0.87258697  OF  0.992532) 

(4104  by  0.961597  2  2 1~M  0.97637403 1~M  0.97637403) 

(0  hard  0  1  1453  nil  0  ARE  0.995585) 

(4788  candy  0.7092225  13  51  AN  0.9659385  AN  0.9659385) 

(4446  are  0.998812  1  1  ARE  0.998812  ARE  0.998812) 

(5985  for  0.25463632  5  328  CHORUSED  0.37175626  NEAT  0.9625765) 

(171  quick  0.7866073  1  12  QUICK  0.7866073  ARE  0.946069) 

(2052  in  0.5334605  1  9  IN  0.5334605  END  0.731756) 

(3420  in  0.676669  1  5  IN  0.676669  BUY  0.859683) 

(0  consists  0  1  1245  nil  0  SPHERE  0.920451) 

(3762  of  0.917667  1  1  OF  0.917667  OF  0.917667) 

(2736  and  0.401804  4  128  AM  0.48402652  SING  0.930416) 

(4275  are  0.905255  1  2  ARE  0.905255  PA  0.9336165) 

(3420  intelligence  0.5118874  2  153  INTELLIGENT  0.53903586  EDGE  0.8861355) 
(2736  is  0.605402  8  12  IF  0.94380903  IF  0.94380903) 
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(4275  for  0.83752096  1  13  OR  0.83752096  KEY  0.941649) 

(0  become  0  1  1623  nil  0  BE  0.969887) 

(0  artists  0  1  1912  nil  0  ARE  0.999065) 

(4959  fish  0.76207745  1  4  FISH  0.76207745  THIS  0.98891 15) 

(4446  leap  0.9622715  1  1  LEAP  0.9622715  LEAP  0.9622715) 

(5814  frantically  0.5103947  3  293  ANT  0.5835723  ON  0.97726345) 

(2565  of  0.316217  1  45  OF  0.316217  ODD  0.7949635) 

(0  forces  0  1  1238  nil  0  STRESS  0.93730706) 

(4104  from  0.4906535  2  8  REAL  0.512767  IS  0.820791) 

(3249  lily  0.54333335  1  87  LILY  0.54333335  LED  0.9229655) 

(4617  are  0.830769  1  6  ARE  0.830769  TEA  0.934932) 

(6498  of  0.010472  1  728  OF  0.010472  ILL  0.615679) 

(2907  come  0.7918515  5  13  COW  0.8600515  TOWN  0.9834515) 

(4446  makes  0.94201 165  4  7  MAY  0.996417  MAY  0.996417) 

(2907  cat  0.77059835  2  8  CAT-S  0.82733977  ELSE  0.938785) 

(3249  my  0.063865  8  690  AIR  0.2639265  ARE  0.8623) 

(3249  by  0.1223  5  313 1~LL  0.519367  IN  0.961836) 

(3933  of  0.014204  14  377  ILL  0.173008  AM  0.7062485) 

(3420  may  0.867208  1  1  MAY  0.867208  MAY  0.867208) 

(2223  money  0.6645517  5  24  MY  0.787362  MY  0.963306) 

(3591  by  0.81 1009  1  7  BUY  0.81 1009  NOR  0.93372047) 

(0  hard  0  1  1355  nil  0  ARE  0.991059) 

(4446  candy  0.388333  28  262  ADD  0.791 1 14  TAD  0.881927) 

(4959  for  0.1684645  7  321  ON  0.345944  OF  0.917015) 

(3078  pretty  0.45062673  7  115  PRAYED  0.55536026  KNEE  0.94666946) 
(2565  in  0.196995  8  109  LID  0.437546  KNEE  0.915486) 

(0  dance  0  1  1479  nil  0  AT  0.9657485) 

(0  and  0  1  457  nil  0  AM  0.79846) 

(0  dislikes  0  1  2473  nil  0  IT  0.8793355) 

(0  retracted  0  1  2674  nil  0  RACK  0.9876833) 

(1197  an  0.06891 15  10  187  AT  0.47751898  TEA  0.541239) 

(3420  michael  0.7428125  6  11 1~M  0.8559135  THE  0.8988155) 

(3249  of  0.409402  7  1 15  AT  0.53922  ART  0.9003095) 

(3078  is  0.7876595  1  4  EARS  0.7876595  KNEES  0.9990305) 

(2907  big  0.63857096  6  9  BE  0.868748  BE  0.868748) 

(2736  in  0.38117698  1  39  IN  0.381 17698  ELSE  0.9379525) 

(0  customer  0  1  1927  nil  0  US  0.994631) 

(4788  spilled  0.7035637  7  41  KILLED  0.91600066  ILL  0.989218) 

(1368  some  0.037465002  7  535  LOCKED  0.35230434  LET  0.910957) 

(1368  please  0.17267467  22  309  PLACED  0.54949224  KISSED  0.6640133) 
(3591  dig  0.12736  18  501  ACT  0.6418087  LISTED  0.84387374) 

(0  my  0  1  580  nil  0  GRINNED  0.66892654) 

(2736  ride  0.54452664  7  27  ROAD  0.8167493  ROAD  0.9402815) 

(2565  in  0.22683099  9  130  IT  0.617258  AT  0.723625) 

(4104  are  0.563125  2  28  RAW  0.7813345  OR  0.999544) 
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(4788  for  0.995764  1  1  OR  0.995764  OR  0.995764) 

(1026  big  0.5722765  7  23  ILL  0.8902275  ILL  0.899718) 

(3420  ambled  0.54366124  14  26  CALF  0.9260225  CALF  0.9260225) 

(1368  increases  0.6184137  2  98  INCREASE  0.65949774  LEAST  0.978867) 
(2907  customer  0.76898956  5  14  TEA  0.947944  US  0.99869454) 

(4617  spilled  0.8451383  2  8  PILL  0.865184  US  0.98971) 

(2907  some  0.57434696  9  42  SEX  0.63510865  ELM  0.925004) 

(3933  please  0.48703066  4  70  EASE  0.730546  EASE  0.990975) 

(4104  dig  0.391167  9  134  DIM  0.6540757  MY  0.964345) 

(3249  my  0.962133  1  1  MY  0.962133  MY  0.962133) 

(2907  ride  0.57451 1  8  72  RACK  0.8173527  ADD  0.9679675) 

(2907  in  0.201971  6  52  SOME  0.505221  SOME  0.505221) 

(5 130  are  0.2361745  5  210  RUN  0.9478655  RUN  0.9478655) 

(4104  all  0.345022  9  128  MONEY  0.71685934  TEA  0.7587085) 

(5472  errors  0.433757  3  149  ROAD  0.46167198  LEAD  0.9250475) 

(4104  nearest  0.28203964  16  239  IN  0.577139  BE  0.740341) 

(3591  may  0.892892  1  1  MAY  0.892892  MAY  0.892892) 

(2907  be  0.336387  1  22  BE  0.336387  ROCK  0.702994) 

(2223  distance  0.6228146  1  34  DISTANCE  0.6228146  TEA  0.9120685) 
(3078  buys  0.692306  3  5  BUY  0.720777  EYES  0.72473097) 

(0  catastrophic  0  1  3294  nil  0  STOCK  0.97606003) 

(0  combine  0  1  1522  nil  0  BUY  0.967192) 

(5472  all  0.994891  1  1  ALL  0.994891  ALL  0.994891) 

(3078  in  0.1521735  4  168  TEAM  0.494284  TEAM  0.660109) 

(3933  by  0.292749  5  199  I-M  0.5886125  AM  0.87930596) 

(1710  and  0.331002  17  331  DISK  0.653731  CENT  0.7352383) 

(4788  stray  0.7577805  4  14  TRAIN  0.81299734  CRY  0.8825245) 

(513  for  0.112957336  5  206  ORIGIN  0.21438925  IF  0.6195735) 

(5985  one  0.28148398  9  209  AND  0.49153  READ~V_PRES  0.6938086) 
(684  of  0.5077385  1  12  OF  0.5077385  LAWS  0.7075705) 

(3078  buy  0.777875  1  6  BUY  0.777875  PA  0.98566604) 

(3420  be  0.990867  1  1  BE  0.990867  BE  0.990867) 

(4446  right  0.6455705  3  9 1~D  0.661486  AN  0.810198) 

(6498  hand  0.30770633  2  173  AND  0.4615595  IN  0.76886404) 

(4959  and  0.18954599  16  340  AN  0.375376  TIM  0.760818) 

(4104  sprained  0.82729375  4  6  PAIN  0.9443275  PAIN  0.9443275) 

(4959  steep  0.5203943  10  49  TEA  0.9196435  TEA  0.9196435) 

(2907  can  0.42971066  6  46  CARE  0.6598744  AIR  0.86047745) 

(3420  be  0.999839  1  1  BE  0.999839  BE  0.999839) 

(3420  diseases  0.76576793  1  18  DISEASES  0.76576793  EASE  0.998306) 
(4617  are  0.646894  1  3  ARE  0.646894  EAR  0.792076) 

(2052  in  0.87319005  1  1  IN  0.87319005  IN  0.87319005) 

(3249  he  0.74578655  1  2  HE  0.74578655  THESE  0.777815) 

(4788  from  0.60703564  1  10  FROM  0.60703564  AIN~T  0.80482304) 

(513  every  0.35394  8  196  TELL  0.621548  IF  0.92564654) 
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(1710  in  0.4313595  4  42  IT  0.5647165  IT  0.5647165) 

(2736  of  0.035078  8  324  HONEST  0.48596674  DARK  0.54200935) 

(855  cleans  0.17036575  42  526  RAIN  0.4130055  BRAIN  0.83505404) 

(0  for  0  1  704  nil  0  ROLE  0.7933965) 

(3420  of  0.162165  1  210  OF  0.162165  MOUNT  0.7292686) 

(3762  and  0.39425197  4  51  AN  0.76499  KNEE  0.8294545) 

(0  prevented  0  1  1974  nil  0  ARE  0.975677) 

(0  from  0  1  540  nil  0  ERA  0.93445253) 

(0  contains  0  1  2166  nil  0  TEA  0.99684155) 

(5130  intelligence  0.8044853  2  24  INTELLIGENT  0.81 194705  EDGE  0.99919105) 
(5472  is  0.968724  1  1  IS  0.968724  IS  0.968724) 

(7182  for  0.36635366  4  218  REAL  0.664661  REAL  0.9362235) 

(0  become  0  1  1967  nil  0  BE  0.982635) 

(4788  artists  0.24544099  42  653  ARE  0.932992  BUY  0.937533) 

(5301  fish  0.7248385  2  3  IN  0.915028  IN  0.915028) 

(4446  leap  0.493816  1  24  LEAP  0.493816  TEA  0.96826) 

(5301  frantically  0.37154865  24  333  RAN  0.587496  KEY  0.91041553) 

(171  of  0.501342  2  13  WAS  0.6137665  THE  0.8197855) 

(0  retracted  0  1  2336  nil  0  RACK  0.957482) 

(2052  an  0.2162705  20  152  SICK  0.61050606  OF  0.832635) 

(3249  michael  0.958744  1  1  MICHAEL  0.958744  MICHAEL  0.958744) 

(0  of  0  1  356  nil  0  DIM  0.78959095) 

(1881  is  0.37623  5  42  DREAM  0.49455667  OF  0.8292395) 

(4275  big  0.94293153  1  1  BIG  0.94293153  BIG  0.94293153) 

(4104  in  0.339775  3  43  NEAR  0.43979633  EVERY  0.8512975) 

(4959  are  0.259575  9  287  ARC  0.4677135  CRASH  0.60771 13) 

(3420  or  0.580724  1  4  OR  0.580724  ARE  0.984356) 

(0  proceeding  0  1  2005  nil  0  SEA  0.9981 1) 

(3933  grows  0.5591417  23  51  GRILL  0.9464357  GRILL  0.9464357) 

(3933  call  0.73034453  1  2  ALL  0.73034453  RUN  0.86439097) 

(4788  an  0.92885053  1  1  AN  0.92885053  AN  0.92885053) 

(0  for  0  1  1008  nil  0  IT  0.874666) 

(3933  can  0.959235  1  1  CAN  0.959235  CAN  0.959235) 

(4275  buy  0.853348  1  5  BUY  0.853348  ARE  0.985204) 

(3762  mine  0.505879  1  25  MINE  0.505879  EASE  0.998016) 

(1 197  planned  0.08599333  15  867  MEANT  0.47092232  QUICK  0.6518717) 

(5472  is  0.4919295  3  55  IT  0.50881  OUGHT  0.97591996) 

(2565  is  0.706433  5  1 1  IT  0.9841635  ARE  0.993545) 

(0  controlled  0  1  1591  nil  0  SAYS  0.9839845) 

(3762  from  0.344589  4  66  RAIN  0.5358917  SON  0.658935) 

(2736  is  0.20857549  13  238  IT  0.53426003  REST  0.7541354) 

(4617  for  0.6953935  5  19  OUGHT  0.95354295  OUGHT  0.95354295) 

(3933  buyer  0.9765045  3  3  BUY  0.999261  BUY  0.999261) 

(3078  money  0.6815513  1  19  MONEY  0.6815513  MY  0.965821) 

(171  for  0.31384933  8  149  RUNS  0.57116896  KNEE  0.919641) 
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(3762  are  0.102933  3  224  ROAD  0.11515299  TAN  0.794415) 

(6840  all  0.988596  1  2  ALL  0.988596  SALT  0.99028164) 

(0  errors  0  1  1272  nil  0  ALL  0.893047) 

(4104  nearest  0.5596807  5  51  IT  0.7421895  SON  0.799916) 

(3591  may  0.937341  1  1  MAY  0.937341  MAY  0.937341) 

(3249  be  0.25528  13  229  BUILT  0.6265753  PILL  0.87550247) 

(3762  distance  0.43542424  3  139  REST  0.5544107  SETS  0.83637196) 
(3078  buys  0.46647102  11  52  BITES  0.8312975  BITES  0.8312975) 

(0  catastrophic  0  1  2599  nil  0  LAST  0.98519444) 

(2394  are  0.006498  14  485  ODD  0.421895  IS  0.937666) 

(4104  for  0.5060835  5  44  ALL  0.8072635  PICK  0.913553) 

(2736  quick  0.57827365  8  41  KILL  0.809182  OR  0.970962) 

(3933  in  0.2490525  3  103  ILL  0.2541455  AM  0.988533) 

(513  in  0.27841  3  40  EARS  0.4405765  BUY  0.995865) 

(0  consists  0  1  1352  nil  0  IT  0.940386) 

(2736  of  0.851429  1  2  OF  0.851429  SIT  0.93266404) 

(2565  and  0.1344525  8  264  TELL  0.584324  STREETS  0.92195153) 
(3591  are  0.832452  2  3  ARC  0.8809045  ARC  0.8809045) 

(2736  customer  0.8599866  1  7  CUSTOMER  0.8599866  US  0.99727) 
(4788  spilled  0.9325223  2  4  PILL  0.9482985  PILL  0.9482985) 

(2907  some  0.447724  7  80  SEX  0.8478367  ILL  0.996894) 

(1539  please  0.60795504  1  26  PLEASE  0.60795504  EASE  0.9112005) 
(3420  dig  0.48332602  1  39  DIG  0.48332602  ATE  0.981412) 

(3933  my  0.794769  1  1  MY  0.794769  MY  0.794769) 

(2907  ride  0.6087447  3  8  WRIST  0.6714106  RAIN  0.820413) 

(4275  in  0.536494  2  2  NIP  0.643829  NIP  0.643829) 

(3420  of  0.554975  1  29  OF  0.554975  AND  0.90585697) 

(2052  and  0.127388  21  250  LAUGH  0.538555  TREE  0.7854405) 

(3420  prevented  0.22422881  55  886  RIM  0.92980963  TRIM  0.942167) 
(5130  from  0.4207315  1  37  FROM  0.4207315  AM  0.978893) 

(0  contains  0  1  1573  nil  0  TEA  0.881098) 

(6498  are  0.2457265  1  191  ARE  0.2457265  KEY  0.896343) 

(4275  for  0.957553  1  2  OR  0.957553  OR  0.965848) 

(4446  big  0.700608  5  8  ILL  0.7912135  ILL  0.7912135) 

(4275  ambled  0.64620626  7  13  AM  0.864462  AM  0.956882) 

(0  increases  0  1  1569  nil  0  CEASE  0.965352) 

(4959  for  0.287623  6  86  ON  0.44945902  NOW  0.796309) 

(0  pretty  0  1  1590  nil  0 1~D  0.8393295) 

(342  in  0.5563405  1  1  IN  0.5563405  IN  0.5563405) 

(4104  dance  0.65918064  1  21  DANCE  0.65918064  AN  0.949515) 

(684  and  0.5202895  14  52  ANT  0.842291  TEN  0.86545604) 

(3762  dislikes  0.6922227  9  53  ILL  0.87714505  LIKE  0.9881125) 

(3762  are  0.061782  22  247  ARM  0.521851  LESS  0.7447065) 

(0  for  0  1  484  nil  0  EAR  0.783261) 

(5130  quick  0.8222285  1  9  WICK  0.8222285  EAT  0.9561425) 
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(4788  in  0.105167  5  227  NIP  0.2075245  LAW  0.4420625) 

(2052  in  0.572961  2  8  DIG  0.5801695  ARE  0.987601) 

(0  consists  0  1  1373  nil  0  OF  0.943267) 

(3591  of  0.942059  1  1  OF  0.942059  OF  0.942059) 

(2394  and  0.24205601  4  221  AL  0.58170754  ARE  0.76434255) 

(4959  are  0.0126705  2  629  ODD  0.021 1 16  TEA  0.87823653) 

(2394  is  0.2736015  1 1  252  IT  0.7267795  IT  0.7267795) 

(0  controlled  0  1  2161  nil  0  ODD  0.983926) 

(4104  from  0.75789803  1  1  FROM  0.75789803  FROM  0.75789803) 

(2223  is  0.0807145  10  364  WICK  0.4238055  RAPE  0.640555) 

(4788  for  0.6092925  3  3  ON  0.7343955  ON  0.7343955) 

(4275  buyer  0.9718395  1  1  BUYER  0.9718395  BUYER  0.9718395) 

(3420  money  0.708845  2  4  US  0.9327085  US  0.9327085) 

(342  for  0.16528967  22  400  KNEES  0.53427  KNEE  0.833307) 

(0  of  0  1  721  nil  0  TIM  0.6788815) 

(3762  and  0.134182  31  270  ANT  0.3890175  LAY  0.950136) 

(3591  prevented  0.084134206  16  1009  ROB  0.44178602  IT  0.7976395) 

(0  from  0  1  829  nil  0  SEND  0.55283433) 

(0  contains  0  1  1788  nil  0  STRAINED  0.9250145) 

(4104  and  0.20510833  5  21 1  DID  0.25674367  TEA  0.91370046) 

(4617  are  0.36019748  1  24  ARE  0.36019748  THREE  0.62333035) 

(4275  spend  0.46326566  2  90  SPILLED  0.46927235  MY  0.978548) 

(342  in  0.5909885  9  22  AM  0.74917805  MY  0.966139) 

(3249  miami  0.9447363  3  4 1~D  0.991473 1~D  0.991473) 

(684  is  0.42749602  14  44  IT  0.82759905  IT  0.82759905) 

(3078  be  0.653656  2  12  BEING  0.681274  NEED  0.957286) 

(2565  need  0.638063  2  6  KNEE  0.87804  IF  0.900002) 

(4617  for  0.8652715  1  1  OR  0.8652715  OR  0.8652715) 

(1368  and  0.25122398  3  281  NEARING  0.318531  ROLE  0.85107005) 

(513  in  0.32452  2  34  IS  0.5336035  EASE  0.67721355) 

(4275  big  0.999287  1  1  BIG  0.999287  BIG  0.999287) 

(2736  can  0.454583  2  27  KEY  0.60292053  BE  0.973139) 

(3420  be  0.923087  1  1  BE  0.923087  BE  0.923087) 

(2223  an  0.268826  4  72  THAN  0.34334052  BE  0.956785) 

(5301  sport  0.47390667  12  174  CART  0.6281 127  ARE  0.892985) 

(5472  all  0.647157  1  8  ALL  0.647157  SAW  0.810637) 

(2565  are  0.4571 155  6  136  LOCK  0.76775  ALL  0.948955) 

(4617  expected  0.5492617  10  83  STACKED  0.7324807  ALL  0.804497) 
(2907  comes  0.121709995  18  604  COURSE  0.37037733  REST  0.7497035) 
(5130  for  0.26589867  6  226  BORING  0.4284985  MAY  0.933095) 

(2052  his  0.5696475  1  5  HIS  0.5696475  IS  0.68533) 

(0  broken  0  1  1684  nil  0  KIN  0.635677) 

(4446  all  0.901629  1  1  ALL  0.901629  ALL  0.901629) 

(3591  are  0.972313  2  2  ARC  0.98222196  ARC  0.98222196) 

(4959  expected  0.53827965  21  118  CENT  0.8555587  ARC  0.985645) 
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(171  comes  0.52415234  12  56  REST  0.7411865  SOME  0.851967) 

(0  for  0  1  386  nil  0  RUN  0.812273) 

(1881  his  0.5154885  1  4  HIS  0.5154885  THE  0.5859575) 

(2907  broken  0.4907278  6  44  BELT  0.57648134  RENT  0.7241013) 

(2736  call  0.70741934  4  6  CORN  0.7425173  OR  0.825639) 

(1539  an  0.375216  3  52  RUN  0.60828197  AM  0.997437) 

(0  for  0  1  421  nil  0  MEN  0.61 14845) 

(4617  can  0.60615647  1  1  AN  0.60615647  AN  0.60615647) 

(2907  buy  0.47503302  8  83  BALL  0.71575403  ARE  0.991208) 

(3591  mine  0.5331815  1  78  MINE  0.5331815  TRIED  0.814265) 

(3420  planned  0.086512335  26  702  TEN  0.3818285  HEN  0.7003375) 
(4104  is  0.4446375  1  59  IS  0.4446375  OR  0.979491) 

(4617  every  0.844344  2  2  EVER  0.99694896  EVER  0.99694896) 

(0  related  0  1  1265  nil  0  AIN-T  0.81 1343) 

(3933  my  0.985477  1  1  MY  0.985477  MY  0.985477) 

(3249  ability  0.362378  2  129  ABYSS  0.38104436  IN  0.8585845) 

(2052  is  0.5198515  1  6  IS  0.5198515  HELL  0.7518625) 

(3249  is  0.46853197  1  13  IS  0.46853197  ARE  0.9082) 

(0  controlled  0  1  1569  nil  0  ELSE  0.9256515) 

(5472  from  0.31060448  4  155  ROB  0.50110066  RAY  0.922059) 

(1539  is  0.51759946  2  4  RIPE  0.58055604  RIPE  0.58055604) 

(6498  for  0.6307355  5  7  OUGHT  0.96566653  OUGHT  0.96566653) 

(4104  buyer  0.5623455  1  27  BUYER  0.5623455  DIM  0.93507254) 

(4104  money  0.8814573  1  5  MONEY  0.8814573  KNEE  0.999766) 

(3249  for  0.35635433  3  126  CHORUSED  0.3884872  KNEE  0.99935) 
(2565  intelligence  0.52076185  5  51  ENGINE  0.67599326  TEN  0.9033555) 
(1026  is  0.30459398  8  109  IF  0.5357565  LEAF  0.5639385) 

(0  for  0  1  773  nil  0  KNEE  0.982102) 

(0  become  0  1  1625  nil  0  BE  0.987363) 

(0  artists  0  1  2152  nil  0  SHE  0.8535655) 

(4446  fish  0.9354415  1  2  HSH  0.9354415  BE  0.983377) 

(3933  leap  0.089551  16  633  EAT  0.4401735  TEA  0.940344) 

(0  frantically  0  1  2890  nil  0  AN  0.834304) 

(3591  of  0.959988  1  1  OF  0.959988  OF  0.959988) 

(6156  and  0.040183667  4  255  ILL  0.05457  ACTS  0.67574733) 

(4104  sprained  0.78803277  3  18  PAIN  0.98850703  PAIN  0.98850703) 
(5301  steep  0.50072736  2  34  TEA  0.6404525  KNEE  0.9408915) 

(0  can  0  1  742  nil  0  ARC  0.875213) 

(3762  be  0.986445  1  1  BE  0.986445  BE  0.986445) 

(3762  diseases  0.62148863  1  16  DISEASES  0.62148863  IT  0.89462197) 
(1881  are  0.34065098  5  67  SOME  0.39545548  RAY  0.796336) 

(3933  in  0.3540065  7  152  RISKS  0.6405156  SEA  0.970379) 

(342  for  0.365057  6  151  ON  0.5054515  ALL  0.993864) 

(4788  pretty  0.435367  12  106  RENT  0.6633123  KNEE  0.9692675) 

(171  in  0.232987  9  101  EAR  0.479839  KNEE  0.6929615) 
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(0  dance  0  1  1032  nil  0  HOW  0.847875) 

(2223  and  0.20224732  26  237  KNICK-  0.73492897  CENT  0.9029353) 
(0  dislikes  0  1  1663  nil  0  ACT  0.9327345) 

(2394  is  0.5131435  1  4  IS  0.5131435  WAS  0.57172596) 

(4788  for  0.0159135  22  454  ROAM  0.503335  KEY  0.98454547) 

(4275  ended  0.23631726  14  212  DIM  0.7827057  DIM  0.98747504) 
(2052  please  0.39836535  4  131  PICK  0.5465825  WE  0.974495) 

(4788  cleaners  0.36770633  2  97  IF  0.5494435  IF  0.64037) 

(4104  for  0.9355215  3  3  OUGHT  0.9671535  OUGHT  0.9671535) 
(2223  cleaners  0.2655725  30  312  KEY  0.7671885  TEA  0.846433) 
(4275  the  0.454218  2  10  LED  0.692657  PECK  0.79161704) 

(2565  the  0.745192  1  4  THE  0.745192  OR  0.853809) 

(2565  the  0.902124  1  1  THE  0.902124  THE  0.902124) 

(3762  the  0.96634  1  1  THE  0.96634  THE  0.96634) 

(2736  the  0.25974947  7  123  THIS  0.668189  THIS  0.668189) 

(2223  the  0.1079595  3  172  THUS  0.468537  US  0.741017) 

(2736  sun  0.7762487  4  5  US  0.968899  US  0.968899) 

(1368  the  0.3803755  3  49  THUS  0.628791  NECK  0.73513603) 

(0  worn  0  1  1019  nil  0  RAIN  0.9902245) 

(171  the  0.9818715  1  1  THE  0.9818715  THE  0.9818715) 

(1368  the  0.005612  27  393  ROB  0.19858234  PARK  0.60054034) 
(2223  the  0.4501 19  1  39  THE  0.4501 19  MY  0.968003) 

(2223  will  0.31016633  2  58  KNEE  0.519023  BE  0.988622) 

(3078  the  0.42693752  1  2  THE  0.42693752  OF  0.845197) 

(2565  the  0.95262  1  1  THE  0.95262  THE  0.95262) 

(2223  the  0.098076  2  176  WAS  0.2025785  WISH  0.659184) 

(3078  the  0.089277  1  237  THE  0.089277  IN  0.47251 1) 

(4617  the  0.473477  1  10  THE  0.473477  RENT  0.602642) 

(3249  was  0.8800345  1  2  WAS  0.8800345  IS  0.91925) 

(171  the  0.041274  19  269  ILL  0.8286385  BUY  0.999808) 

(2394  the  0.575898  3  9  SOME  0.6027925  KNIT  0.78812504) 

(0  treats  0  1  2297  nil  0  TEA  0.8071 135) 

(2565  the  0.82877004  1  1  THE  0.82877004  THE  0.82877004) 

(3762  was  0.99162304  1  1  WAS  0.99162304  WAS  0.99162304) 

(684  the  0.2619755  3  55  THEN  0.40894452  MEN  0.789021) 

(2907  the  0.527025  1  14  THE  0.527025  CEASE  0.5825963) 

(0  the  0  1  430  nil  0  ATE  0.843047) 

(3933  we  0.0542905  16  572  CLASS  0.48572233  TELL  0.84219646) 
(171  the  0.77612746  1  3  THE  0.77612746  BUY  0.853089) 

(3591  wore  0.22754  12  191  WARN  0.44194567  IN  0.74606645) 

(3078  the  0.5477635  1  34  THE  0.5477635  ALL  0.889541) 

(2394  the  0.5104145  1  25  THE  0.5104145  AM  0.805595) 

(171  this  0.500427  5  37  THE  0.7626285  WHISPER  0.8215667) 

(171  the  0.442171  3  47  WITH  0.5069965  AM  0.9849055) 

(684  the  0.027683001  5  5 19  DRINK  0.1 115665  SKI  0.9797725) 
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(342  we  0.227364  7  208  ILL  0.4663275  AL  0.914455) 

(0  was  0  1  584  nil  0  IT  0.866098) 

(3591  the  0.943863  1  1  THE  0.943863  THE  0.943863) 

(2736  the  0.504444  1  21  THE  0.504444  ARC  0.806921) 

(2565  the  0.875511  1  1  THE  0.875511  THE  0.875511) 

(2736  sun  0.84010935  1  1  SON  0.84010935  SON  0.84010935) 
(2394  the  0.70100904  1  2  THE  0.70100904  RUN  0.775074) 

(0  worn  0  1  790  nil  0  OWN  0.9890425) 

(2565  the  0.5653075  1  1  THE  0.5653075  THE  0.5653075) 

(2565  the  0.52978903  1  16  THE  0.52978903  ART  0.8994985) 
(4275  this  0.47602868  8  89  SIT  0.714043  ATE  0.998719) 

(3078  the  0.1697205  1  170  THE  0.1697205  KEY  0.677721) 

(684  the  0.578504  1  3  THE  0.578504  KEY  0.657517) 

(2223  the  0.812001  2  5  THEN  0.9888485  THEN  0.9888485) 

(684  the  0.0254405  5  508  US  0.5077695  TEA  0.968306) 

(2736  the  0.8278585  1  1  THE  0.8278585  THE  0.8278585) 

(4104  the  0.2726575  2  41  THEN  0.43012297  TEN  0.5696705) 
(2736  the  0.947765  1  1  THE  0.947765  THE  0.947765) 

(3591  the  0.700835  1  2  THE  0.700835  TORE  0.749406) 

(2907  the  0.4788925  1  6  THE  0.4788925  UP  0.609221) 

(3591  will  0.43275467  9  56  WE  0.997546  WE  0.997546) 

(3249  the  0.759889  1  2  THE  0.759889  OF  0.779293) 

(2736  the  0.8803475  1  2  THE  0.8803475  FISH  0.98429) 

(2736  the  0.620148  1  22  THE  0.620148  KNICK-  0.89830697) 
(3078  the  0.49229702  1  22  THE  0.49229702  SOME  0.96171 1) 
(3591  wore  0.1744375  21  258  I~M  0.846501  ARM  0.9558705) 
(342  the  0.67778146  1  1  THE  0.67778146  THE  0.67778146) 
(2736  than  0.144986  26  394  TREE  0.77871203  IT  0.9576535) 
(5985  than  0.6719605  1  5  THAN  0.6719605  AN  0.802022) 

(3078  was  0.51489  1  17  WAS  0.51489  US  0.964689) 

(0  the  0  1  472  nil  0  CAR  0.651604) 

(2907  the  0.44461998  3  45  AIM  0.5159605  ARE  0.98403) 

(855  the  0.790439  1  6  THE  0.790439  HE~LL  0.8535045) 

(3591  was  0.9885875  1  1  WAS  0.9885875  WAS  0.9885875) 
(2907  wore  0.76599896  1  4  WAR  0.76599896  LOCK  0.8586675) 
(2565  the  0.6881505  1  1  THE  0.6881505  THE  0.6881505) 

(1539  than  0.067943  4  283  ANT  0.2357715  BUY  0.821261) 
(1881  the  0.947352  1  1  THE  0.947352  THE  0.947352) 

(4104  was  0.401308  5  210  LIGHT  0.49929652  SIT  0.8648545) 
(2052  the  0.57808053  2  21  THESE  0.600977  ARE  0.880982) 
(7353  wall  0.5117893  7  14  ALL  0.830434  ALL  0.830434) 

(5130  the  0.22679001  1  136  THE  0.22679001  CAR  0.8500055) 
(2736  the  0.9093465  1  1  THE  0.9093465  THE  0.9093465) 

(3078  the  0.33977002  1  22  THE  0.33977002  KEY  0.9144565) 
(3249  the  0.302076  3  85  COW  0.501854  LAY  0.8230875) 
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(0  the  0  1  331  nil  0  IT  0.87952554) 

(171  the  0.8373985  1  3  THE  0.8373985  HAM  0.86587) 

(171  the  0.15303999  9  385  EAR  0.42136252  TEA  0.9124595) 

(3420  the  0. 195832  1  111  THE  0. 195832  DIM  0.7437995)  ' 

(4275  we  0.881949  1  3  WE  0.881949  ARE  0.933848) 

(3933  was  0.4865135  1  72  WAS  0.4865135  TEA  0.92392254) 

(0  the  0  1  555  nil  0  IS  0.999555) 

(0  unlimited  0  1  1958  nil  0  ERA  0.836439) 

(0  the  0  1  208  nil  0  ATE  0.990514) 

(5643  wall  0.084955  22  502  LOCKED  0.419337  LAST  0.9872987) 

(0  the  0  1  356  nil  0  SON  0.89699) 

(2736  the  0.1134265  5  421  DANCE  0.4980867  KNEE  0.93237) 

(3078  wore  0.18724449  17  294  OR  0.7527705  OR  0.7527705) 

(2565  the  0.24662651  2  70  CAT  0.25278297  DONE  0.86283946) 

(0  than  0  1  692  nil  0  RAY  0.653137) 

(0  task  0  1  1065  nil  0  SAW  0.81003594) 

(2394  will  0.771551  3  4  EAR  0.833609  EAR  0.833609) 

(2565  this  0.4865255  3  33  THE  0.97639  THE  0.97639) 

(171  the  0.75475645  1  1  THE  0.75475645  THE  0.75475645) 

(4275  task  0.381098  21  486  TELLS  0.823551  CELLS  0.9142103) 

(3249  will  0.6232765  3  15  EAR  0.665694  AIR  0.963568) 

(0  this  0  1  755  nil  0  SEA  0.918846) 

(2736  the  0.5582475  1  1  THE  0.5582475  THE  0.5582475) 

(3078  the  0.5606495  2  7  THUS  0.7237975  US  0.932176) 

(684  will  0.4549725  1  50  ILL  0.4549725  CALL  0.96442103) 

(2736  trish  0.6122495  9  83  TRICK  0.7439185  EDGE  0.989782) 

(3420  the  0.8483325  1  3  THE  0.8483325  IN  0.88464653) 

(342  the  0.94074655  1  1  THE  0.94074655  THE  0.94074655) 

(3591  was  0.0752195  16  347  LIKED  0.43082923  NEAT  0.9341 13) 

(3078  the  0.17501  5  157  AND  0.24300967  READ~V_PRES  0.62223667) 
(7695  wall  0.282456  5  272  LED  0.4796063  RUN  0.8712685) 

(0  the  0  1  385  nil  0  EVERY  0.92699397) 

(0  the  0  1  655  nil  0  BE  0.985547) 

(3249  the  0.55738354  1  9  THE  0.55738354  MY  0.822937) 

(0  the  0  1  533  nil  0  KNICK-  0.559777) 

(3762  these  0.2589925  2  191  EASE  0.29168552  BE  0.872134) 

(0  than  0  1  418  nil  0  EASE  0.959365) 

(3249  the  0.26742  1  64  THE  0.26742  EVER  0.7157455) 

(342  the  0.932883  1  1  THE  0.932883  THE  0.932883) 

(2394  the  0.7092755  1  1  THE  0.7092755  THE  0.7092755) 

(2565  the  0.91542447  1  1  THE  0.91542447  THE  0.91542447) 

(1368  will  0.64829135  5  15  WE  0.97806096  WE  0.97806096) 

(2907  trish  0.84094745  1  5  TRISH  0.84094745  PLAY  0.93975246) 

(1 197  the  0.5186895  2  23  STAB  0.5315227  AN  0.8217035) 

(342  the  0.6890155  1  1  THE  0.6890155  THE  0.6890155) 
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(3078  the  0.878139  1  2  THE  0.878139  TORE  0.9235275) 

(2565  the  0.760522  1  1  THE  0.760522  THE  0.760522) 

(2907  will  0.208685  6  199  VAULT  0.5231597  TEA  0.978969) 

(3078  the  0.7097615  1  1  THE  0.7097615  THE  0.7097615) 

(3249  the  0.603384  1  1  THE  0.603384  THE  0.603384) 

(2907  the  0.7708205  1  3  THE  0.7708205  ERA  0.884822) 

(2907  the  0.9070835  1  2  THE  0.9070835  OF  0.9169755) 

(2907  the  0.6081685  2  5  THESE  0.60943866  IN  0.877829) 

(2736  the  0.716003  1  1  THE  0.716003  THE  0.716003) 

(5985  the  0.0043555  3  644  LED  0.17543  ARE  0.919311) 

(0  unlimited  0  1  2224  nil  0  IN  0.883306) 

(4788  the  0.370967  3  47  THIS  0.4145955  EASE  0.7495205) 

(5130  wall  0.22401601  16  364  US  0.5144835  SLEIGH  0.797277) 
(3420  the  0.19431551  1  115  THE  0.19431551  IS  0.74428) 

(2736  the  0.3941055  9  123  DUMB  0.6693135  SON  0.95748055) 
(684  the  0.973424  1  1  THE  0.973424  THE  0.973424) 

(2565  the  0.377638  3  50  ACT  0.48215303  TEA  0.61591 1) 

(0  this  0  1  752  nil  0  ATE  0.986086) 

(3420  the  0.050598  2  347  SILLY  0.26855233  STILL  0.85328054) 
(171  the  0.5140325  1  1  THE  0.5140325  THE  0.5140325) 

(2736  the  0.9737675  1  1  THE  0.9737675  THE  0.9737675) 

(171  the  0.7379085  1  1  THE  0.7379085  THE  0.7379085) 

(2223  the  0.377769  3  77  SLEIGH  0.57948697  SLEIGH  0.9008975) 
(3933  this  0.41908264  1  32  MISS  0.41908264  ACE  0.624796) 

(3420  the  0.0547825  1  148  THE  0.0547825  BIKE  0.8845985) 

(2565  the  0.55574  1  1  THE  0.55574  THE  0.55574) 

(3762  will  0.59611833  2  37  WALL  0.887725  ALL  0.992549) 

(2907  trish  0.8684265  1  14  TRISH  0.8684265  EAT  0.96680796) 

(0  the  0  1  565  nil  0  BE  0.864258) 

(5 13  the  0.6295435  1  2  THE  0.6295435  HA  0.63 11115) 

(3078  was  0.5228165  1  94  WAS  0.5228165  WISH  0.9261615) 

(855  the  0.1358285  15  185  ILL  0.671 1355  ILL  0.807344) 

(513  the  0.367979  1  19  THE  0.367979  RAY  0.708896) 

(2736  treats  0.66461 104  19  33  TREES  0.89496154  EARS  0.910582) 
(6498  than  0.83598197  1  7  THAN  0.83598197  AN  0.968299) 

(3078  was  0.918036  1  1  WAS  0.918036  WAS  0.918036) 

(2394  the  0.3632665  1  32  THE  0.3632665  ARE  0.752489) 

(2394  the  0.528739  1  16  THE  0.528739  DEEP  0.9178345) 

(1881  the  0.1385545  1  297  THE  0.1385545  ILL  0.980646) 

(3591  was  0.5198695  1  15  WAS  0.5198695  ON  0.918822) 

(701 1  will  0.4034745  1  35  ILL  0.4034745  ALL  0.770763) 

(2565  trish  0.585051  10  23  TEA  0.862162  TEA  0.862162) 

(513  the  0.50764847  1  17  THE  0.50764847  BE  0.957186) 

(2907  the  0.999707  1  1  THE  0.999707  THE  0.999707) 

(2736  the  0.9075155  1  1  THE  0.9075155  THE  0.9075155) 
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(2907  the  0.946249  1  2  THE  0.946249  IN  0.96514153) 

(5130  sun  0.71402  3  7  ON  0.7286515  IN  0.9005995) 

(2907  the  0.3672275  1  52  THE  0.3672275  RACK  0.823037) 

(0  worn  0  1  1 185  nil  0  OR  0.987955) 

(4104  wore  0.491969  5  51  WARM  0.63495564  ARM  0.87829196) 

(171  the  0.7414875  1  1  THE  0.7414875  THE  0.7414875) 

(2394  than  0.133799  7  201  LAID  0.32271135  IS  0.785685) 

(342  the  0.604874  1  3  THE  0.604874  QUICK  0.61750335) 

(2394  the  0.486619  1  9  THE  0.486619  OF  0.763142) 

(4275  this  0.32955334  8  264  MR.  0.4968015  TITTER  0.9013066) 

(3078  tile  0.1266185  1  210  THE  0.1266185  CRY  0.8856445) 

(342  the  0.4891925  5  10  WITH  0.6598315  WITH  0.6598315) 

(0  the  0  1  359  nil  0  KNEE  0.86991596) 

(3078  the  0.7683375  1  4  THE  0.7683375  BUY  0.999634) 

(3078  the  0.414173  1  51  THE  0.414173  SIT  0.727611) 

(5472  these  0.0053806663  1  726  CHEESE  0.0053806663  ARE  0.995406) 
(0  than  0  1  590  nil  0  BE  0.999217) 

(0  the  0  1  279  nil  0  IN  0.75825155) 

(684  task  0.15120566  37  821  START  0.4555007  PART  0.72664636) 
(2907  will  0.35023698  10  41  WE-RE  0.6189813  ARE  0.9031225) 

(0  this  0  1  650  nil  0  SEA  0.747749) 

(342  the  0.551514  1  2  THE  0.551514  QUICK  0.5647867) 

(0  task  0  1  1099  nil  0  SICK  0.820452) 

(2736  will  0.5615515  1  8  ILL  0.5615515  THE  0.8113395) 

(0  this  0  1  61 1  nil  0  ADD  0.63919747) 

(171  the  0.683971  1  1  THE  0.683971  THE  0.683971) 

(2394  the  0.865638  1  1  THE  0.865638  THE  0.865638) 

(3249  the  0.62504  1  2  THE  0.62504  WICK  0.645445) 

(2565  the  0.8065915  1  1  THE  0.8065915  THE  0.8065915) 

(3420  will  0.040328667  17  305  LOCK  0.43002  BE  0.887826) 

(3591  the  0.6385975  1  3  THE  0.6385975  ALL  0.804233) 

(3591  the  0.97034  1  1  THE  0.97034  THE  0.97034) 

(2907  the  0.6542155  1  2  THE  0.6542155  OF  0.737128) 

(3420  the  0.8365265  2  2  THUS  0.852963  THUS  0.852963) 

(0  the  0  1  497  nil  0  ALL  0.935096) 

(3420  the  0.6601355  1  6  THE  0.6601355  OF  0.821017) 

(2565  the  0.5435635  1  6  THE  0.5435635  CAMPUS  0.5656137) 

(0  these  0  1  976  nil  0  ARE  0.999778) 

(4959  than  0.079916  18  420  KNEE  0.4675295  KEY  0.894155) 

(3249  the  0.254316  1  60  THE  0.254316  EVER  0.70559454) 

(0  the  0  1  590  nil  0  LIE  0.94849) 

(4104  we  0.948938  1  2  WE  0.948938  ALL  0.951947) 

(4275  was  0.190098  2  132  CUISINE  0.35340834  IS  0.985833) 

(171  the  0.6675005  1  1  THE  0.6675005  THE  0.6675005) 

(5472  the  0.4215855  1  47  THE  0.4215855  KISSED  0.68443567) 
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(342  the  0.72726953  1  8  THE  0.72726953  WICK  0.97505605) 

(1 1286  was  0.0451375  1  944  WAS  0.0451375  OFF  0.99909496) 

(684  the  0.091617  3  228  THAN  0.22057751  BUY  0.989939) 

(2907  the  0.928851  1  1  THE  0.928851  THE  0.928851) 

(5472  treats  0.54150504  24  142  TRICK  0.77524203  TEA  0.96630645) 
(171  the  0.8175795  1  9  THE  0.8175795  HE  0.974051) 

(3249  the  0.12563601  1  423  THE  0.12563601  STICK  0.96876) 

(2907  the  0.861535  1  1  THE  0.861535  THE  0.861535) 

(0  the  0  1  578  nil  0  IN  0.899881) 

(0  unlimited  0  1  2227  nil  0  EAT  0.8252105) 

(0  the  0  1  318  nil  0  TIM  0.736835) 

(5301  wall  0.028018  8  617  LATE  0.18272734  LOT  0.91990197) 

(3249  the  0.14722951  7  167  TEA  0.5551365  AT  0.942549) 

(2565  the  0.083315  4  220  THEN  0.183807  RUN  0.72707) 

(0  the  0  1  628  nil  0  ILL  0.581221) 

(0  unlimited  0  1  2232  nil  0  EAT  0.977455) 

(3249  the  0.238753  1  115  THE  0.238753  EASE  0.747286) 

(2394  wall  0.136533  24  482  LOSS  0.5021535  COME  0.919701) 

(3249  the  0.7424165  1  5  THE  0.7424165  REST  0.98474735) 

(2907  the  0.3277725  5  22  THESE  0.57103765  EAT  0.664331) 

(3420  the  0.184094  2  157  EAT  0.448576  IT  0.804107) 

(2907  the  0.5786055  1  6  THE  0.5786055  HE  0.95671654) 

(1710  the  0.23177901  14  143  SAY  0.7617705  PAY  0.9295455) 

(2736  the  0.045766503  6  344  CEASE  0.3783015  DISH  0.855828) 

(171  the  0.563142  2  8  THESE  0.66874903  DIG  0.78841054) 

(2907  was  0.77284294  1  1  WAS  0.77284294  WAS  0.77284294) 

(5643  the  0.008665999  3  670  THAN  0.031233  AND  0.84428847) 
(3078  the  0.522916  1  2  THE  0.522916  TEA  0.53317) 

(0  treats  0  1  1698  nil  0  US  0.9660955) 

(3420  the  0.0555795  8  265  SOLID  0.22996876  LIVE~V  0.6756325) 
(4275  we  0.992549  1  1  WE  0.992549  WE  0.992549) 

(3591  was  0.1 103855  10  240  ABYSS  0.234242  NEED  0.6386025) 
(513  the  0.826066  1  1  THE  0.826066  THE  0.826066) 

(2565  the  0.0648645  2  233  WICK  0.2388435  ATE  0.965426) 

(2736  the  0.4616065  4  46  ACE  0.6350225  KNEES  0.94346297) 

(171  the  0.7417825  1  1  THE  0.7417825  THE  0.7417825) 

(1026  will  0.2440545  9  133  WICK  0.5475285  HE  0.7620995) 

(2907  trish  0.6295907  8  29  TRIP  0.9229546  TRIP  0.9229546) 

(0  the  0  1  416  nil  0  BEER  0.62897867) 

(2736  the  0.87374854  1  3  THE  0.87374854  BUY  0.922286) 

(4275  was  0.49964  1  5  WAS  0.49964  ARE  0.942503) 

(3762  the  0.773248  1  2  THE  0.773248  CALF  0.831834) 

(3933  the  0.2477015  14  261  DIM  0.444474  ROAM  0.94182366) 

(2565  the  0.304686  1  18  THE  0.304686  OF  0.4582065) 

(5301  we  0.047595  1  432  WE  0.047595  LET  0.53787297) 
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(2736  the  0.530248  1  12  THE  0.530248  LINE  0.92718947) 

(2736  the  0.7808835  1  3  THE  0.7808835  BUY  0.86987) 

(4104  was  0.748237  1  1  WAS  0.748237  WAS  0.748237) 

(684  the  0.128176  12  167  THAN  0.463631  AN  0.906402) 

(3762  the  0.35616  1  100  THE  0.35616  IF  0.863239) 

(2565  the  0.705183  1  1  THE  0.705183  THE  0.705183) 

(5301  we  0.0277665  13  329  TAN  0.3483285  ILL  0.800423) 

(2565  the  0.9637505  1  1  THE  0.9637505  THE  0.9637505) 

(0  wore  0  1  756  nil  0  ARE  0.933507) 

(2565  the  0.090088  8  225  EAR  0.363051  COST  0.7579125) 

(2736  the  0.9881815  1  1  THE  0.9881815  THE  0.9881815) 

(2907  this  0.7345645  1  1  THIS  0.7345645  THIS  0.7345645) 

(0  the  0  1  859  nil  0  ILL  0.498484) 

(513  the  0.39701098  1  41  THE  0.39701098  BUY  0.992124) 

(3078  the  0.61956847  1  7  THE  0.61956847  SWISS  0.72284967) 
(5472  these  0.2369235  1  338  THESE  0.2369235  NUT  0.760216) 
(6156  than  0.51579297  2  17  TEA  0.6244575  TEA  0.849308) 

(2736  the  0.0464105  2  254  OF  0.0464495  TEAM  0.62928265) 

(2736  the  0.84915555  1  1  THE  0.84915555  THE  0.84915555) 

(2907  the  0.4007385  1  60  THE  0.4007385  AIM  0.74190354) 

(6156  the  0.015384  7  598  LID  0.035053  SEA  0.9563885) 

(4104  unlimited  0.1 1347401  56  877  KNEE  0.5010205  BE  0.996057) 
(3420  the  0.55884004  1  21  THE  0.55884004  AIN~T  0.944098) 
(3249  wall  0.544594  1  79  WALL  0.544594  BLOCK  0.98272467) 
(3078  the  0.060517997  1  233  THE  0.060517997  CAST  0.927329) 
(342  the  0.17281699  8  265  KEY  0.466686  AT  0.9217) 

(2907  the  0.922265  1  1  THE  0.922265  THE  0.922265) 

(3420  was  0.69346803  1  10  WAS  0.69346803  ILL  0.8634765) 

(2907  the  0.248667  I  73  THE  0.248667  IT~S  0.547809) 

(2907  the  0.39461002  1  66  THE  0.39461002  IF  0.965405) 

(3249  the  0.762645  1  1  THE  0.762645  THE  0.762645) 

(2907  we  0.5809535  2  4  WE~LL  0.709991  HE~LL  0.71309) 

(3591  will  0.3941975  1  15  ILL  0.3941975  LIE  0.5468085) 

(5130  trish  0.8704037  2  6  TRICK  0.8798137  LAY  0.969091) 

(1026  the  0.6899405  1  7  THE  0.6899405  BE  0.997985) 

(342  will  0.3346825  13  55  WICK  0.641755  KEY  0.8612755) 

(1881  trish  0.44326332  30  251  TEA  0.9446345  KNEE  0.972317) 
(684  the  0.5195885  1  20  THE  0.5195885  BE  0.97741 1) 

(2736  the  0.6839325  1  1  THE  0.6839325  THE  0.6839325) 

(2736  the  0.541995  1  19  THE  0.541995  AN  0.86701) 

(3591  the  8.59e-4  14  509  ALL  0.3633925  ELSE  0.8800665) 

(3591  we  0.914141  1  2  WE  0.914141  ILL  0.9192935) 

(3762  was  0.53423804  2  23  OF  0.546967  KNEE  0.8964095) 

(2223  under  0.101728  36  450  ELL  0.44404498  LEAVE  0.821977) 
(2565  the  0.40935  1  17  THE  0.40935  WITH  0.49506152) 
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(0  under  0  1  1034  nil  0  RAY  0.927307) 

(171  the  0.644195  1  5  THE  0.644195  RAY  0.764272) 

(3933  under  0.322684  7  350  NEAR  0.53844064  LEAD  0.9329965) 
(2565  the  0.1400865  1  91  THE  0.1400865  BIKE  0.799939) 

(342  the  0.4812945  2  23  IS  0.492935  LET  0.6978255) 

(2223  the  0.34521002  3  63  THESE  0.53299564  BE  0.929587) 

(0  the  0  1  235  nil  0  NEAT  0.9254805) 

(2565  the  0.6228265  1  1  THE  0.6228265  THE  0.6228265) 

(3249  the  0.996065  1  1  THE  0.996065  THE  0.996065) 

(3591  wore  0.00825  31  900  ALL  0.447578  ILL  0.76312196) 

(2565  the  0.466874  4  39  LET~S  0.52638966  ROAST  0.9610065) 

(2565  the  0.91719496  1  1  THE  0.91719496  THE  0.91719496) 

(171  this  0.5684395  2  5  THE  0.8452945  THE  0.8452945) 

(513  the  0.0510915  5  235  DUMPED  0.3636655  IN  0.861077) 

(0  wore  0  1  931  nil  0  OF  0.765463) 

(2736  the  0.8338095  1  1  THE  0.8338095  THE  0.8338095) 

(0  the  0  1  656  nil  0  RUN  0.622997) 

(5130  this  0.38540533  6  75  GUESS  0.5864833  DESK  0.848067) 

(4275  than  0.23620832  21  158  KNEE  0.55023396  KNEE  0.906168) 
(3249  was  0.88548696  1  1  WAS  0.88548696  WAS  0.88548696) 

(4446  the  0.1 149365  4  216  MICHAEL  0.49387452  LIVE~V  0.8558395) 
(0  the  0  1  558  nil  0  CANDY  0.5451047) 

(171  the  0.720538  1  5  THE  0.720538  HE~LL  0.79148996) 

(3933  was  0.998739  1  1  WAS  0.998739  WAS  0.998739) 

(2052  the  0.76296103  1  2  THE  0.76296103  SPHERE  0.76542497) 

(2736  the  0.26055  2  83  SILLY  0.31010064  AND  0.7256235) 

(2907  the  0.293741  2  92  HONEST  0.45881552  ALL  0.7884) 

(2565  the  0.978893  1  1  THE  0.978893  THE  0.978893) 

(171  the  0.564294  2  2  THESE  0.590451  THESE  0.590451) 

(0  the  0  1  463  nil  0  AN  0.598126) 

(171  the  0.0529015  13  232  PIG  0.46038198  IS  0.977372) 

(2736  the  0.779508  1  1  THE  0.779508  THE  0.779508) 

(1 197  the  0.53478  1  15  THE  0.53478  DID  0.81160796) 

(2736  was  0.3627725  12  101  IT  0.851 107  IT  0.851 107) 

(0  the  0  1  395  nil  0  EAR  0.835273) 

(7524  wall  0.21853699  21  262  ALL  0.629417  AIR  0.7910735) 

(513  the  0.1132985  2  114  THOU  0.139966  SEA  0.796709) 

(2907  the  0.569126  1  9  THE  0.569126  SIT  0.6418495) 

(4446  the  0.3367365  3  14  MICHAEL  0.4360955  MICHAEL  0.4360955) 
(2565  the  0.8846445  1  5  THE  0.8846445  BUY  0.987337) 

(2736  the  0.81858397  1  3  THE  0.81858397  BUY  0.969478) 

(4275  was  0.977543  1  1  WAS  0.977543  WAS  0.977543) 

(3249  the  0.8791 11  12  THE  0.8791 1 1  ERA  0.981 18055) 

(3249  the  0.5387715  1  5  THE  0.5387715  OF  0.961559) 

(2565  the  0.34568697  1  19  THE  0.34568697  COUNT  0.59965533) 
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(2736  we  0.513766  3  8  WE~RE  0.61372036  EAR  0.6751985) 

(2565  the  0.5324095  2  4  THIS  0.7468715  HIS  0.873003) 

(3249  was  0.47012052  1  22  WAS  0.47012052  ARM  0.8285225) 

(2565  the  0.40560502  7  21  STILL  0.4694635  IT  0.60864) 

(6156  wall  0.19532901  1  233  BALL  0.19532901  RAN  0.62830335) 

(2736  the  0.456375  1  1  THE  0.456375  THE  0.456375) 

(2565  the  0.66212  1  2  THE  0.66212  MAY  0.664955) 

(3078  was  0.5509845  1  12  WAS  0.5509845  IT  0.8173045) 

(2565  the  0.0783265  4  126  ACT  0.38905367  ART  0.667902) 

(4275  wall  0.545503  1  7  WALL  0.545503  ALL  0.999688) 

(0  the  0  1  358  nil  0  TRAY  0.879961) 

(2565  the  0.580187  1  4  THE  0.580187  IT  0.6887785) 

(2565  the  0.49063  1  13  THE  0.49063  WICK  0.65976346) 

(2565  the  0.510821  3  21  THOU  0.7228735  COME  0.8583255) 

(3420  the  0.062272  1  244  THE  0.062272  AIN~T  0.5020493) 

(2736  the  0.872897  1  1  THE  0.872897  THE  0.872897) 

(2907  the  0.5485095  2  18  THIS  0.6900365  ICE  0.9056915) 

(2907  the  0.348525  3  65  US  0.51483697  IT~S  0.8443765) 

(5130  sun  0.97477055  1  1  SON  0.97477055  SON  0.97477055) 

(3249  the  0.29002398  2  126  SPHERE  0.30418935  READ-V_PAST  0.8179465) 
(0  worn  0  1  1525  nil  0  RIM  0.89601 1) 

(513  the  0.8925515  1  2  THE  0.8925515  HIM  0.8942175) 

(2565  the  0.478013  5  17  IT~S  0.55102897  ARE  0.711627) 

(2394  the  0.8770805  1  1  THE  0.8770805  THE  0.8770805) 

(2907  sun  0.7374893  3  7  SORRY  0.80078197  THE  0.89327097) 

(2394  the  0.8465765  1  1  THE  0.8465765  THE  0.8465765) 

(0  worn  0  1  1278  nil  0  ERA  0.79170704) 

(3420  under  0.41614333  1  198  UNDER  0.41614333  RUN  0.8709607) 

(855  the  0.85718  1  1  THE  0.85718  THE  0.85718) 

(2736  the  0.0128285  3  508  THESE  0.12273934  TEA  0.92637706) 

(0  unlimited  0  1  1900  nil  0  AT  0.957312) 

(3249  the  0.508462  3  35  TELL  0.53018063  AIN~T  0.792299) 

(3762  wall  0.1073705  15  493  DOLL  0.44354752  DOLL  0.89343655) 

(3420  the  0.076607  5  138  TEA  0.305827  LIST  0.8943275) 

(1881  the  0.039697  12  810  KNEE  0.573208  AN  0.977269) 

(3249  task  0.31153068  12  307  TORE  0.6106235  SAW  0.90540004) 

(2052  will  0.40695348  3  45  EAR  0.55227953  EAR  0.831273) 

(0  this  0  1  565  nil  0  MUST  0.814269) 

(2565  the  0.649042  1  1  THE  0.649042  THE  0.649042) 

(0  the  0  1  427  nil  0  ALL  0.99274) 

(171  the  0.355927  2  63  AM  0.50849754  BUY  0.999531) 

(1026  the  0.494549  4  49  SPHERE  0.58674335  TRAY  0.91108453) 

(3762  these  0.38062197  2  175  EAT  0.5328275  ARE  0.995955) 

(0  than  0  1  339  nil  0  EASE  0.99983) 

(2736  the  0.2910715  1  19  THE  0.2910715  BUY  0.746633) 
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(0  the  0  1  577  nil  0  OR  0.924294) 

(3762  we  0.904988  1  1  WE  0.904988  WE  0.904988) 

(3591  was  0.696374  1  5  WAS  0.696374  ALL  0.951223) 

(2736  the  0.8847255  1  1  THE  0.8847255  THE  0.8847255) 

(3591  was  0.648481  4  5  WISH  0.902261  WISH  0.902261) 

(513  the  0.16047199  6  279  SOLID  0.350133  TEN  0.71072996) 
(342  the  0.367831  1  40  THE  0.367831  RAY  0.8856255) 

(0  treats  0  1  1507  nil  0  TEA  0.9799825) 

(4104  wore  0.504042  5  72  WARM  0.66869336  ARM  0.930055) 
(2736  the  0.3791865  1  23  THE  0.3791865  GUM  0.5295945) 

(3078  than  0.060658  14  347  DECK  0.1210285  ROAM  0.6681755) 
(5472  than  0.57458097  1  3  THAN  0.57458097  EYES  0.735682) 
(3762  was  0.323644  1  37  WAS  0.323644  US  0.693246) 

(0  the  0  1  573  nil  0  ARE  0.97406) 

(2565  the  0.61632645  1  12  THE  0.61632645  ILL  0.972592) 

(2565  the  0.516213  1  35  THE  0.516213  HE  0.90698755) 

(3591  was  0.94465554  1  1  WAS  0.94465554  WAS  0.94465554) 
(2736  the  0.8844875  1  1  THE  0.8844875  THE  0.8844875) 

(2907  the  0.82390654  1  6  THE  0.82390654  ICE  0.940022) 

(3249  the  0.7257505  1  5  THE  0.7257505  IT  0.7508135) 

(5130  sun  0.986646  1  1  SON  0.986646  SON  0.986646) 

(2736  the  0.22590101  2  157  THESE  0.254147  RIGHT  0.98105097) 
(4275  worn  0.34513667  1  292  WARN  0.34513667  AM  0.997683) 
(342  the  0.3665235  4  42  KEY  0.50267  IN  0.7698285) 

(4104  the  0.0301695  7  266  IT  0.045665  LEAD  0.497779) 

(3078  the  0.69218  1  5  THE  0.69218  BUY  0.92516) 

(3078  the  0.057348497  11  362  SEA  0.503851  IS  0.998231) 

(0  unlimited  0  1  1615  nil  0  TEA  0.9905995) 

(3249  the  0.6444045  1  3  THE  0.6444045  ODD  0.6794715) 

(5985  wall  0.45591798  4  132  ALL  0.768154  SAW  0.87883604) 
(4788  the  0.5741235  1  24  THE  0.5741235  REST  0.884682) 

(2565  the  0.4997175  3  57  THAN  0.95463455  MAN  0.9604145) 

(0  the  0  1  574  nil  0  ALL  0.867901) 

(171  the  0.317881  2  45  THUS  0.42208648  MY  0.958849) 

(2907  the  0.19056699  1  143  THE  0.19056699  SINCE  0.696683) 
(3420  these  0.110496506  9  310  OAK  0.3581985  ARE  0.95186245) 
(684  than  0.26863867  3  1 14  KNEE  0.3464675  BE  0.952617) 

(3249  the  0.1473715  4  193  AND  0.15726833  RAIN  0.585546) 

(684  the  0.885381  1  1  THE  0.885381  THE  0.885381) 

(2736  the  0.31 14515  1  72  THE  0.3114515  CAMP  0.59570533) 
(2736  than  0.24102099  10  180  DID  0.47575533  IN  0.799693) 

(2907  was  0.4602595  1  16  WAS  0.4602595  TEA  0.7049305) 

(3078  the  0.2591405  5  219  ENDED  0.37910974  AT  0.8729615) 

(0  the  0  1  414  nil  0  TEA  0.9590955) 

(1026  the  0.378219  1  84  THE  0.378219  HE  0.9451825) 


323 


(3249  was  0.5215455  1  4  WAS  0.5215455  IS  0.7262085) 

(171  the  0.7184425  2  2  THESE  0.732487  THESE  0.732487) 
(2565  the  0.2598435  1  56  THE  0.2598435  TRISH  0.585104) 

(0  this  0  1  542  nil  0  IT  0.74026954) 

(4617  the  0.27820697  1  82  THE  0.27820697  IT  0.6266525) 

(171  the  0.591943  1  1  THE  0.591943  THE  0.591943) 

(2394  the  0.97185445  1  1  THE  0.97185445  THE  0.97185445) 
(2736  the  0.078334495  1  194  THE  0.078334495  ICE  0.917104) 
(2394  the  0.8094075  1  1  THE  0.8094075  THE  0.8094075) 

(4959  sun  0.889925  1  4  SON  0.889925  I~D  0.8919385) 

(2394  the  0.518238  1  24  THE  0.518238  DISH  0.80822647) 

(5130  worn  0.38571966  5  297  IRATE  0.632631  AID  0.951686) 
(3078  the  0.570474  1  6  THE  0.570474  WITH  0.79149854) 

(3249  the  0.3032795  1  45  THE  0.3032795  KNEE  0.741986) 
(3591  the  0.6634635  1  5  THE  0.6634635  TEA  0.83302104) 
(2565  the  0.5287935  1  14  THE  0.5287935  ARE  0.872629) 

(3078  task  0.5400837  17  102  TOWN  0.81 13575  OFF  0.9610625) 
(2394  will  0.1946595  12  148  EAR  0.4307685  THE  0.75644195) 
(0  this  0  1  737  nil  0  SEA  0.752483) 

(2565  the  0.521795  1  13  THE  0.521795  WITH  0.7529865) 

(171  the  0.614286  1  1  THE  0.614286  THE  0.614286) 

(3249  was  0.5069605  2  10  OF  0.5078665  WE  0.981282) 

(1 197  the  0.397429  1  32  THE  0.397429  OF  0.720441) 

(3762  the  0.444929  2  45  TRAY  0.6226835  RAY  0.949067) 

(0  treats  0  1  2146  nil  0  TAN  0.644902) 

(2565  the  0.6274515  1  1  THE  0.6274515  THE  0.6274515) 

(2565  the  0.18962  4  128  EAR  0.526505  KNEE  0.772608) 

(2565  the  0.873248  1  3  THE  0.873248  BUY  0.886979) 

(3078  the  0.9262695  1  1  THE  0.9262695  THE  0.9262695) 

(3420  the  0.598792  1  3  THE  0.598792  ILL  0.667543) 

(2394  the  0.5439255  2  15  THESE  0.578362  ALL  0.996771) 
(1368  will  0.4198297  5  62  ILL  0.715159  BE  0.925489) 

(3249  the  0.748232  1  6  THE  0.748232  RIM  0.85 1 109) 

(2736  the  0.81524503  1  1  THE  0.81524503  THE  0.81524503) 
(2565  the  0.787722  1  4  THE  0.787722  ANT  0.822792) 

(2736  the  0.637925  1  6  THE  0.637925  EAR  0.770397) 

(0  task  0  1  1445  nil  0  LAUGH  0.912956) 

(2394  will  0.912969  3  3  EAR  0.9555435  EAR  0.9555435) 

(0  this  0  1  528  nil  0  SEA  0.933863) 

(2907  the  0.6822185  1  2  THE  0.6822185  TEA  0.6936805) 

(2394  the  0.58695203  1  12  THE  0.58695203  CAR  0.943241) 
(3591  was  0.3391335  6  206  ATE  0.47452098  IN  0.950606) 

(0  the  0  1  601  nil  0  ILL  0.732722) 

(2394  the  0.1526845  3  49  TIN  0.178903  RAY  0.5830445) 

(0  treats  0  1  1549  nil  0  ATE  0.986179) 
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(0  under  0  1  1077  nil  0  LAY  0.764654) 

(2736  the  0.3798635  2  9  THESE  0.38235402  HE  0.4852555) 
(2736  the  0.599431  1  3  THE  0.599431  HIM  0.641549) 

(2907  was  0.89277  1  3  WAS  0.89277  ILL  0.923845) 

(2565  the  0.2499485  2  79  TEA  0.3125885  LID  0.945893) 

(5301  wall  0.48219267  3  25  ALL  0.619242  OF  0.852644) 

(2736  the  0.072713494  1  83  THE  0.072713494  IT~S  0.64876866) 
(2565  the  0.713495  1  1  THE  0.713495  THE  0.713495) 

(2394  the  0.0380145  16  237  WITH  0.568051  HE  0.67684) 

(2907  the  0.044775  7  316  KNEE  0.458973  TEA  0.887658) 

(2394  the  0.4737945  4  54  AND  0.6074353  US  0.86330354) 

(3420  the  0.68855155  1  1  THE  0.68855155  THE  0.68855155) 
(4788  was  0.49388748  1  42  WAS  0.49388748  US  0.957709) 

(0  the  0  1  446  nil  0  ERRORS  0.815972) 

(0  the  0  1  497  nil  0  IF  0.645151) 

(2394  the  0.730102  1  2  THE  0.730102  OF  0.802235) 

(2394  we  0.041088  18  342  THE  0.639168  OF  0.988577) 

(171  the  0.7944405  1  1  THE  0.7944405  THE  0.7944405) 

(2223  the  0.3192675  1  50  THE  0.3192675  HE  0.7575605) 

(3249  the  0.037294  4  536  THESE  0.260927  WICK  0.7434525) 
(2736  the  0.0903705  10  278  TEST  0.46350667  IS  0.85717297) 
(2736  the  0.720841  1  1  THE  0.720841  THE  0.720841) 

(2736  the  0.93147  1  1  THE  0.93147  THE  0.93147) 

(3078  the  0.38100052  2  65  CEASE  0.41 1668  ALL  0.9821) 

(3078  the  0.648998  1  2  THE  0.648998  ALL  0.866767) 

(3420  the  0.076109  5  183  THESE  0.1524515  ROAST  0.541912) 
(3762  these  0.0900105  2  304  MILK  0.10504267  ARE  0.931357) 
(5814  than  0.13678633  9  427  KNEE  0.4808355  BE  0.994737) 
(1881  the  0.65347254  1  2  THE  0.65347254  EVER  0.9357265) 
(2565  the  0.54824  1  5  THE  0.54824  ARE  0.837643) 

(2736  the  0.9329845  1  1  THE  0.9329845  THE  0.9329845) 

(2394  the  0.1741075  6  382  THAN  0.668051  AN  0.994553) 

(2223  the  0.6594045  1  1  THE  0.6594045  THE  0.6594045) 

(3249  the  0.656476  1  2  THE  0.656476  IF  0.6960795) 

(684  the  0.544517  1  1  THE  0.544517  THE  0.544517) 

(342  the  0.3412185  4  51  RAKED  0.452368  TEA  0.688361) 

(2565  the  0.587179  1  8  THE  0.587179  KNEE  0.8451785) 

(342  will  0.3283435  2  28  ELL  0.4054775  BE  0.985708) 

(3249  the  0.5566595  1  9  THE  0.5566595  NUMB  0.83983505) 
(171  the  0.7084625  1  1  THE  0.7084625  THE  0.7084625) 

(2907  the  0.595182  1  1  THE  0.595182  THE  0.595182) 

(2736  the  0.7985785  1  1  THE  0.7985785  THE  0.7985785) 

(4275  the  0.758478  2  3  THIS  0.90650046  THIS  0.90650046) 
(2565  was  0.5711213  7  14  WE  0.927827  WE  0.927827) 

(2736  the  0.4730235  1  7  THE  0.4730235  EAR  0.6674705) 
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(7353  wall  0.15435965  2  246  KNEE  0.2691325  RAY  0.990993) 
(1026  the  0.2649295  1  136  THE  0.2649295  EAR  0.6789345) 
(9405  than  0.41239634  1  84  CAN  0.41239634  SOME  0.744279) 
(3933  was  0.85799754  1  1  WAS  0.85799754  WAS  0.85799754) 
(3420  the  0.267525  1  59  THE  0.267525  OF  0.7039045) 

(2394  the  0.66391253  1  10  THE  0.66391253  ARE  0.986791) 

(171  the  0.49842697  6  25  WITH  0.631356  ELL  0.844844) 

(3249  was  0.3979935  1  84  WAS  0.3979935  US  0.935705) 

(2223  the  0.47616702  1  7  THE  0.47616702  DRESS  0.621867) 

(0  wore  0  1  1035  nil  0  ARE  0.995171) 

(2565  the  0.5496825  3  45  LESS  0.5513585  LAW  0.87691545) 
(2907  the  0.7680865  1  1  THE  0.7680865  THE  0.7680865) 

(342  this  0.68866146  2  15  THE  0.868927  EAR  0.90557647) 

(342  the  0.51773846  1  4  THE  0.51773846  EAT  0.64194) 

(3420  the  0.3489655  2  40  DIM  0.48127002  CAMP  0.61487037) 
(171  the  0.674151  1  3  THE  0.674151  BUY  0.84176) 

(0  the  0  1  391  nil  0  LAKE  0.6043825) 

(2565  the  0.7273745  1  6  THE  0.7273745  BUY  0.994386) 

(4617  the  0.284348  1  77  THE  0.284348  HE  0.978692) 

(2565  the  0.4124895  4  50  EGG  0.46949401  HE  0.9799795) 

(3420  the  0.4917135  4  45  THOU  0.5817265  AL  0.8604095) 

(2736  the  0.4286735  3  9  US  0.5552965  US  0.5552965) 

(2565  the  0.97063446  1  1  THE  0.97063446  THE  0.97063446) 
(2565  the  0.033820502  23  342  TAN  0.654716  TAN  0.7684565) 
(3078  the  0.48459  1  43  THE  0.48459  ROSE  0.88223004) 

(513  the  0.89774346  1  2  THE  0.89774346  OF  0.954969) 

(2565  the  0.739455  1  3  THE  0.739455  BUY  0.829665) 

(4104  was  0.5651245  1  6  WAS  0.5651245  RUN  0.80427456) 
(3078  the  0.4023545  2  89  DISH  0.602806  CASH  0.988222) 

(3249  the  0.383938  2  46  LET  0.4291895  SAW  0.603505) 

(2565  the  0.5155515  2  16  THEN  0.59611803  DOWN  0.63386565) 
(3591  we  0.4164355  1  6  WE  0.4164355  AM  0.789955) 

(0  under  0  1  1 171  nil  0  KNEE  0.915064) 

(171  the  0.53724897  1  1  THE  0.53724897  THE  0.53724897) 

(513  the  0.3899035  5  54  WE  0.6852085  KEYED  0.9386215) 

(2907  the  0.33946902  1  4  THE  0.33946902  GRIP  0.34845567) 
(3078  the  0.658926  1  7  THE  0.658926  BUY  0.978749) 

(3249  the  0.899697  1  1  THE  0.899697  THE  0.899697) 

(3933  wore  0.618259  2  22  WARM  0.6944683  OR  0.950359) 

(2565  the  0.87570655  1  1  THE  0.87570655  THE  0.87570655) 

(513  the  0.066413  13  328  IN  0.476162  RUN  0.6687617) 

(4275  this  0.48304  6  29  US  0.846438  US  0.846438) 

(171  the  0.926408  1  1  THE  0.926408  THE  0.926408) 

(2394  the  0.1291385  3  234  THAN  0.6124715  ANT  0.9857515) 
(2907  the  0.026342  7  366  EASE  0.165407  NEED  0.6587335) 
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(171  the  0.57189  1  1  THE  0.57189  THE  0.57189) 

(171  the  0.50918  1  9  THE  0.50918  KEY  0.917469) 

(0  the  0  1  415  nil  0  SING  0.80050254) 

(342  the  0.586075  3  18  THEN  0.695367  KEY  0.94819653) 

(0  the  0  1  431  nil  0  SEA  0.794628) 

(3420  the  0.70653546  1  1  THE  0.70653546  THE  0.70653546) 

(4104  we  0.769319  1  1  WE  0.769319  WE  0.769319) 

(3933  was  0.50662696  1  38  WAS  0.50662696  AN  0.949837) 

(2394  the  0.8151225  1  1  THE  0.8151225  THE  0.8151225) 

(0  wore  0  1  835  nil  0  CEASE  0.6690585) 

(2394  the  0.780717  1  2  THE  0.780717  US  0.899222) 

(2394  the  0.622519  1  3  THE  0.622519  PLAY  0.64398) 

(171  this  0.61273754  1  13  THIS  0.61273754  IS  0.796198) 

(2394  the  0.9222505  1  1  THE  0.9222505  THE  0.9222505) 

(2565  the  0.391264  1  19  THE  0.391264  LIE  0.614771) 

(2565  the  0.424284  1  1  THE  0.424284  THE  0.424284) 

(3591  will  0.33069268  4  100  ILL  0.554969  BE  0.998243) 

(3249  the  0.432243  1  25  THE  0.432243  KNEE  0.857263) 

(342  the  0.575597  1  7  THE  0.575597  EAT  0.743773) 

(2736  the  0.201754  1  108  THE  0.201754  ICE  0.5176735) 

(2907  the  0.720897  1  4  THE  0.720897  OF  0.990163) 

(2394  the  0.848517  1  1  THE  0.848517  THE  0.848517) 

(2052  under  0.39828435  1  53  UNDER  0.39828435  IN  0.7187875) 

(2736  the  0.946296  1  3  THE  0.946296  BUY  0.994639) 

(2736  the  0.546872  1  3  THE  0.546872  WITH  0.689233) 

(2565  the  0.1846725  1  150  THE  0.1846725  IF  0.682848) 

(4104  this  0.286569  2  167  SIT  0.4298535  KISSED  0.7499335) 

(4446  the  0.0431965  5  247  SWIMMING  0.14181349  TRICK  0.56302196) 
(3078  the  0.5112615  1  6  THE  0.5112615  WITH  0.665482) 
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APPENDIX  H.  FREQUENCY  NORMALIZATION  DATA 


A.  DESCRIPTION 

The  following  table  lists  data  used  to  develop  the  frequency  normalization  method 
described  in  Chapter  IV. 


B.  DATA 


TIMIT  Speech  File 

Sample 

Nuunber 

FI  Freq 

(Hz) 

F2  Freq 

(Hz) 

F3  Freq 

(Hz) 

Pitch 

(Hz) 

F : /TIMIT/TRAIN/DR2/FAEM0/SX132 .WAV 

23,911 

333 

2,369 

3,026 

172 

F: /TIMIT/TRAIN/DR2/FAEM0/SX132 .WAV 

24,167 

349 

2,361 

2,856 

117 

F: /TIMIT/TRAIN/DR2/FAJW0/SX183 .WAV 

23,287 

479 

2,540 

3,075 

160 

F: /TIMIT/TRAIN/DR2/FAJW0/SX183  .WAV 

23,543 

503 

2,726 

3,164 

162 

F: /TIMIT/TRAIN/DR2/FCAJ0/SX129 .WAV 

495 

2,523 

3,124 

158 

F : /TIMIT/TRAIN/DR2/FCAJ0/SX129 .WAV 

33,435 

471 

2,588 

3,172 

158 

F : / TIMIT/ TRAIN/ DR2 /FCAJO / SX12  9 . WAV 

33,947 

479 

2,596 

2,986 

154 

F: /TIMIT/TRAIN/DR2/FCMM0/SX183 .WAV 

13,497 

568 

2,418 

3,181 

190 

F: /TIMIT/TRAIN/DR2/FCMM0/SX183 .WAV 

13,753 

519 

2,475 

3,043 

190 

F: /TIMIT/TRAIN/DR2/FCYL0/SX127 .WAV 

17,323 

300 

198 

F: /TIMIT/TRAIN/DR2/FCYL0/SX127 .WAV 

18,091 

365 

2,767 

3,140 

205 

F: /TIMIT/TRAIN/DR2/FDAS1/SX21 .WAV 

21,065 

446 

2,207 

2,872 

195 

F: /TIMIT/TRAIN/DR2/FDAS1/SX21 .WAV 

21,321 

381 

2,345 

3,002 

182 

F: /TIMIT/TRAIN/DR2/FDAS1/SX21 .WAV 

21,577 

406 

2,426 

3,034 

178 

F: /TIMIT/TRAIN/DR2/FDAS1/SX21 .WAV 

22,089 

406 

2,491 

2,848 

184 

F: /TIMIT/TRAIN/DR2/FDNC0/SX18 .WAV 

12,455 

519 

2,402 

3,043 

193 

F: /TIMIT/TRAIN/DR2/FDXW0/SX161 .WAV 

16,961 

373 

2,548 

3,083 

178 

F : /TIMIT/ TRAIN/DR2 / FDXWO / SXl 6 1 . WAV 

17,217 

422 

2,475 

3,018 

178 

F : /TIMIT/TRAIN/DR2 /FDXWO /SX161 .WAV 

17,473 

511 

2,442 

3,002 

178 

F: /TIMIT/TRAIN/DR2/FEAC0/SX255 .WAV 

36,743 

389 

2,199 

2,718 

152 

F: /TIMIT/TRAIN/DR2/FHLM0/SX120 .WAV 

38,551 

365 

2,791 

3,383 

198 

F: /TIMIT/TRAIN/DR2/FHLM0/SX120 .WAV 

38,807 

389 

2,799 

3,083 

200 

F: /TIMIT/TRAIN/DR2/FHLM0/SX120 .WAV 

39,319 

414 

2,580 

3,262 

200 

F: /TIMIT/TRAIN/DR2/FHLM0/SX120 .WAV 

39,575 

406 

2,604 

3,132 

195 

F: /TIMIT/TRAIN/DR2/FHLM0/SX120 .WAV 

39,831 

406 

2,645 

3,099 

190 

F: /TIMIT/TRAIN/DR2/FKAA0/SX128 .WAV 

17,335 

454 

2,320 

2,929 

184 

F :/ TIMIT / TRAIN/ DR2 / FKAAO / SXl 2  8 . WAV 

17,847 

414 

2,385 

3,132 

184 

329 


F : /TIMIT/TRAIN/DR2/FKAA0/SX128 .WAV 

18,103 

341 

2,361 

3,156 

182 

F: /TIMIT/TRA1N/DR2/FLMA0/SX253 .WAV 

5,811 

365 

2,507 

3,018 

239 

F: /TIMIT/TRAIN/DR2/FLMA0/SX253 .WAV 

6,067 

487 

2,556 

3,043 

235 

F: /TIMIT/TRAIN/DR2/FLMA0/SX253 .WAV 

6,323 

479 

2,718 

3,083 

232 

F: /TIMIT/TRAIN/DR2/FMJB0/SX187 .WAV 

24,783 

763 

2,450 

3,010 

180 

F : / TIMIT /TRAIN/DR2 / FMJBO / SXl 8  7 . WAV 

25,039 

771 

2,540 

3,034 

180 

F; /TIMIT/TRAIN/DR2/FMJB0/SX187 .WAV 

25,807 

260 

2,540 

BEE 

184 

F : /TIMIT/TRAIN/DR2 /FMKFO /SX2  7  6 .WAV 

16,302 

414 

2,426 

2,970 

193 

F: /TIMIT/TRAIN/DR2/FMKF0/SX276 .WAV 

16,558 

365 

2,418 

2,913 

193 

F ; /TIMIT / TRAIN/ DR2 / FMMHO / SXl 8  7 . WAV 

20,961 

333 

2,734 

3,286 

178 

F: /TIMIT/TRAIN/DR2/FMMH0/SX187 .WAV 

21,217 

381 

2,702 

3,043 

176 

F: /TIMIT/TRAIN/DR2/FPJF0/SX236 .WAV 

31,758 

462 

2,426 

3,132 

186 

F : /TIMIT/TRAIN/DR2 /FPJFO / SX2  3  6 . WAV 

32,014 

503 

2,580 

3,018 

184 

F : /TIMIT/TRAIN/DR2/FRLL0/SX164 . WAV 

26,551 

438 

2,329 

2,986 

213 

F : /TIMIT/TRAIN/DR2 /FRLLO / SXl 64 . WAV 

27,063 

381 

2,507 

2,864 

203 

F : / TIMIT / TRAIN/ DR2 /FRLLO /SXl 6 4 .WAV 

27,575 

454 

2,467 

2,775 

208 

F : /TIMIT/TRAIN/DR2 /FSCNO / SX17  6 . WAV 

69,415 

495 

2,101 

3,010 

186 

F : /TIMIT/TRAIN/DR2/FSCN0/SX176 .WAV 

69,671 

479 

2,199 

2,856 

178 

F: /TIMIT/TRAIN/DR2/FSCN0/SX176 .WAV 

69,927 

495 

2,207 

2,921 

172 

F: /T1MIT/TRAIN/DR2/FSCN0/SX176 .WAV 

70,183 

487 

2,256 

2,986 

172 

F: /TIMIT/TRAIN/DR2/FSCN0/SX176 .WAV 

70,439 

495 

2,280 

2,986 

170 

F:/TIMIT/TRAIN/DR2/FSCN0/SX176.WAV 

71,463 

438 

2,442 

2,945 

168 

F: /TIMIT/TRA1N/DR2/FSCN0/SX176 .WAV 

71,719 

438 

2,499 

2,929 

167 

F:  /TIMIT /TRAIN/DR2/ FSCNO /SX17 6  .WAV 

71,975 

438 

2,450 

2,937 

167 

F: /TIMIT/TRAIN/DR2/FSKL0/SX179 .WAV 

39,738 

438 

2,012 

2,385 

142 

F: /TIMIT/TRAIN/DR2/FSKL0/SX179 .WAV 

40,250 

349 

2,199 

2,686 

119 

F: /TIMIT/TRAIN/DR2/FSKL0/SX179 .WAV 

40,506 

341 

2,166 

2,621 

122 

F : /TIMIT/TRAIN/DR2 /FSRHO / SX2  2 1 . WAV 

17,597 

487 

2,320 

3,099 

262 

F: /TIMIT/TRAIN/DR2/FTMG0/SX272 .WAV 

10,147 

414 

2,621 

3,205 

211 

F: /TIMIT/TRAIN/DR2/FTMG0/SX272 .WAV 

11,427 

422 

2,564 

2,978 

229 

F: /TIMIT/TRAIN/DR2/MARC0/SX18 .WAV 

9,362 

422 

1,955 

2,410 

133 

F: /TIMIT/TRAIN/DR2/MARC0/SX18 .WAV 

9,618 

341 

2,037 

2,572 

132 

F: /TIMIT/TRAIN/DR2/MARC0/SX18 .WAV 

9,874 

325 

2,118 

2,572 

129 

F: /TIMIT/TRAIN/DR2/MBJV0/SX167 .WAV 

38,187 

316 

1,663 

3,010 

101 

F: /TIMIT/TRAIN/DR2/MBJV0/SX167 .WAV 

38,443 

325 

1,769 

3,034 

102 

F ; /TIMIT/TRAIN/DR2/MBJV0/SX167 . WAV 

38,699 

341 

1,761 

3,051 

103 

F : /TIMIT/TRAIN/DR2 /MB JVO / SXl 67 . WAV 

38,955 

438 

1,615 

3,059 

102 

F: /TIMIT/TRAIN/DR2/MBJV0/SX167 .WAV 

39,211 

511 

1,469 

3,026 

101 

F: /TIMIT/TRAIN/DR2/MCEW0/SX272 .WAV 

8,192 

430 

1,631 

2,345 

178 

F; /TIMIT/TRAIN/DR2/MCEW0/SX272 .WAV 

8,704 

308 

2,296 

3,440 

190 

F; /TIMIT/TRAIN/DR2/MCEW0/SX272 .WAV 

9,216 

268 

2,288 

3,497 

203 

F: /TIMIT/TRAIN/DR2/MCEW0/SX272 .WAV 

9,472 

243 

2,223 

3,497 

195 

F: /TIMIT/TRAIN/DR2/MDBP0/SX258 .WAV 

45,484 

300 

1,826 

2,580 

105 

F: /TIMIT/TRAIN/DR2/MDBP0/SX258 .WAV 

46,252 

260 

2,085 

2,621 

93 

F: /TIMIT/TRAIN/DR2/MDBP0/SX258 .WAV 

46,508 

316 

2,045 

2,629 

101 

F: /TIMIT/TRAIN/DR2/MDEM0/SX248 .WAV 

6,398 

292 

2,045 

2,572 

203 

F; /TIMIT/TRAIN/DR2/MDEM0/SX248 .WAV 

6,654 

381 

2,101 

2,491 

198 

F: /TIMIT/TRAIN/DR2/MDLB0/SX136 .WAV 

62,057 

519 

2,353 

2,775 

105 

F: /TIMIT/TRAIN/DR2/MDLB0/SX136 .WAV 

62,569 

503 

2,418 

2,686 

95 

F: /TIMIT/TRAIN/DR2/MDLC2/SX174 .WAV 

35,635 

316 

2,045 

2,450 

95 

F: /TIMIT/TRAIN/DR2/MDLC2/SX174 .WAV 

36,147 

308 

1,801 

2,215 

91 

F: /TIMIT/TRAIN/DR2/MDMT0/SX122 .WAV 

13,575 

325 

2,223 

2,759 

125 

F: /TIMIT/TRAIN/DR2/MDMT0/SX122 .WAV 

13,831 

268 

2,215 

2,864 

120 

F; /TIMIT/TRAIN/DR2/MDMT0/SX122 .WAV 

14,087 

268 

2,191 

2,815 

116 

F: /TIMIT/TRAIN/DR2/MDMT0/SX122 .WAV 

14,343 

292 

2,166 

2,726 

114 

F: /TIMIT/TRAIN/DR2/MDPS0/SX179 .WAV 

41,994 

292 

1,858 

2,515 

69 

F: /TIMIT/TRAIN/DR2/MDPS0/SX179 .WAV 

42,250 

284 

1,850 

2,385 

74 

F: /TIMIT/TRAIN/DR2/MDPS0/SX179 .WAV 

42,506 

252 

1,842 

2,320 

78 

F:/TIMIT/TRAIN/DR2/MDSS0/SX261.WAV 

22,443 

292 

2,037 

2,824 

119 

F: /TIMIT/TRAIN/DR2/MDWD0/SX450 .WAV 

19,936 

398 

1,874 

3,270 

133 

F: /TIMIT/TRAIN/DR2/MDWD0/SX450 .WAV 

20,448 

446 

1,931 

2,556 

127 

F: /TIMIT/TRAIN/DR2/MEFG0/SX105 .WAV 

23,069 

462 

1,477 

2,377 

96 

F; /TIMIT/TRAIN/DR2/MEFG0/SX105 .WAV 

23,325 

430 

1,655 

2,385 

96 

F: /TIMIT/TRAIN/DR2/MEFG0/SX105 .WAV 

23,581 

300 

1,793 

2,304 

96 

F : /TIMIT/TRAIN/DR2/MEFG0/SX105 . WAV 

23,837 

308 

1,882 

2,377 

97 

F: /TIMIT/TRAIN/DR2/MEFG0/SX105 .WAV 

24,093 

292 

1,931 

2,369 

94 

F: /TIMIT/TRAIN/DR2/MHRM0/SX148 .WAV 

8,634 

260 

2,239 

2,580 

168 

F: /TIMIT/TRAIN/DR2/MJAE0/SX174 .WAV 

40,071 

446 

1,817 

3,116 

120 

F: /TIMIT/TRAIN/DR2/MJAE0/SX174 .WAV 

40,327 

422 

1,834 

3,124 

122 

F: /T1M1T/TRAIN/DR2/MJBG0/SX152 .WAV 

6,855 

284 

2,166 

2,921 

150 

F : /TIMIT/TRAIN/DR2/MJBG0/SX152 .WAV 

7,111 

284 

2,183 

2,937 

158 

F: /TIMIT/TRAIN/DR2/MJBG0/SX152 .WAV 

7,367 

300 

2,191 

2,953 

162 

F: /TIMIT/TRAIN/DR2/MJBG0/SX152 .WAV 

7,623 

300 

2,191 

2,767 

163 

F: /TIMIT/TRAIN/DR2/MJDE0/SX130 .WAV 

3,351 

252 

1,939 

2,856 

140 

F; /TIMIT/TRAIN/DR2/MJDE0/SX130 .WAV 

3,607 

252 

1,972 

2,832 

144 

F: /TIMIT/TRAIN/DR2/MJDE0/SX130 .WAV 

3,863 

276 

1,972 

2,864 

150 

F: /TIMIT/TRAIN/DR2/MJDE0/SX130 .WAV 

284 

2,020 

2,791 

160 

F: /TIMIT/TRAIN/DR2/MJDE0/SX130 .WAV 

4,631 

316 

1,963 

2,759 

163 

331 


F: /TIMIT/TRAIN/DR2/MJDE0/SX130 .WAV 

4,887 

333 

2,020 

2,718 

160 

F: /TIMIT/TRAIN/DR2/MJEB0/SX170 .WAV 

40,797 

325 

2,045 

2,369 

128 

F: /TIMIT/TRAIN/DR2/MJHI0/SX248 .WAV 

4,897 

292 

2,045 

2,442 

111 

F: /TIMIT/TRAIN/DR2/MJMA0/SX145 .WAV 

17,935 

406 

1,923 

2,588 

113 

F: /TIMIT/TRAIN/DR2/MJMA0/SX145 .WAV 

18,191 

398 

2,020 

2,653 

114 

F: /TIMIT/TRAIN/DR2/MJMA0/SX145 .WAV 

18,447 

308 

2,101 

2,783 

114 

F: /TIMIT/TRAIN/DR2/MJMA0/SX145 .WAV 

18,703 

300 

2,174 

2,702 

115 

F: /TIMIT/TRAIN/DR2/MJMA0/SX145 .WAV 

18,959 

292 

2,272 

2,775 

114 

F: /TIMIT/TRAIN/DR2/MJMD0/SX128 .WAV 

17,489 

292 

2,077 

2,442 

110 

F: /TIMIT/TRAIN/DR2/MJMD0/SX128 .WAV 

17,745 

284 

2,158 

2,499 

107 

F: /TIMIT/TRAIN/DR2/MJMD0/SX128 .WAV 

18,001 

284 

2,126 

2,507 

106 

F : /TIMIT/TRAIN/DR2 /MJPMO /SX18 . WAV 

9,796 

462 

1,874 

2,207 

123 

F: /TIMIT/TRAIN/DR2/MJPM0/SX18 .WAV 

10,564 

487 

1,874 

2,215 

117 

F ; /TIMIT/TRAIN/DR2 /MJRPO /SX13  5 .WAV 

25,860 

284 

2,093 

2,588 

163 

F: /TIMIT/TRAIN/DR2/MJRP0/SX135 .WAV 

26,116 

276 

2,101 

2,734 

165 

F: /TIMIT/TRAIN/DR2/MKAH0/SX178 .WAV 

44,599 

243 

2,101 

2,686 

119 

F: /TIM1T/TRAIN/DR2/MKAJ0/SX154 .WAV 

17,815 

503 

1,817 

2,613 

104 

F : /TIMIT/TRAIN/DR2/MKDT0/SX173 . WAV 

47,535 

260 

2,004 

2,458 

113 

F: /TIMIT/TRAIN/DR2/MKDT0/SX173 .WAV 

47,791 

252 

2,061 

2,523 

112 

F: /TIMIT/TRAIN/DR2 /MKDT0/SX173 .WAV 

48,047 

284 

2,028 

2,556 

110 

F: /TIMIT/TRAIN/DR2/MKDT0/SX173 .WAV 

48,303 

268 

1,955 

2,394 

110 

F: /TIMIT/TRAIN/DR2/MKJOO/SX167 .WAV 

32,469 

260 

2,012 

2,385 

116 

F: /TIMIT/TRAIN/DR2/MKJOO/SX167 .WAV 

32,725 

243 

2,053 

2,467 

115 

F: /TIMIT/TRAIN/DR2/MKJOO/SX167 .WAV 

32,981 

252 

2,061 

2,540 

112 

F : /TIMIT/TRAIN/DR2 /MKJOO/SX167 .WAV 

33,237 

252 

1,996 

2,621 

114 

F: /TIMIT/TRAIN/DR2/MMAG0/SX136.WAV 

57,570 

430 

2,215 

2,613 

93 

F : /TIMIT/TRAIN/DR2 /MMAGO /SX13 6 . WAV 

57,826 

438 

2,264 

2,645 

90 

F: /TIMIT/TRAIN/DR2/MMDS0/SX263 .WAV 

10,155 

308 

1,696 

2,402 

130 

F: /TIMIT/TRAIN/DR2/MMDS0/SX263 .WAV 

10,411 

308 

1,777 

2,353 

128 

F: /TIMIT/TRAIN/DR2/MMDS0/SX263 .WAV 

10,667 

316 

1,866 

2,272 

128 

F : /TIMIT/TRAIN/DR2/MMGK0/SX152 .WAV 

5,231 

308 

2,012 

2,807 

151 

F: /TIMIT/TRAIN/DR2/MMGK0/SX152 .WAV 

5,487 

325 

2,028 

2,556 

150 

F: /TIMIT/TRAIN/DR2/MMGK0/SX152 .WAV 

5,743 

316 

1,842 

2,556 

144 

F: /TIMIT/TRAIN/DR2/MMGK0/SX152 .WAV 

5,999 

300 

1,834 

2,450 

145 

F: /TIMIT/TRAIN/DR2/MMGK0/SX152 .WAV 

6,255 

381 

1,761 

2,434 

151 

F : /TIMIT/TRAIN/DR2 /MMXSO /SX3  3  6 . WAV 

48,172 

373 

1,915 

2,231 

144 

F : /TIMIT/TRAIN/DR2/MMXS0/SX336 .WAV 

48,428 

365 

1,761 

2,085 

144 

F: /TIMIT/TRAIN/DR2/MPPC0/SX152 .WAV 

6,547 

316 

2,337 

2,702 

172 

F: /TIMIT/TRAIN/DR2/MPPC0/SX152 .WAV 

6,803 

349 

2,247 

2,718 

176 

332 


F: /TIMIT/TRAIN/DR2/MPRB0/SX125  .WAV 

27,706 

454 

2,304 

2,994 

99 

F : /TIMIT/TRAIN/DR2/MPRB0/SX125 .WAV 

27,962 

430 

2,426 

2,953 

100 

F : /TIMIT/TRAIN/DR2/MPRB0/SX125 . WAV 

28,218 

414 

2,499 

3,010 

98 

F: /TIMIT/TRAIN/DR2/MRAB0/SX144 .WAV 

9,581 

349 

1,460 

2,264 

133 

F : /TIMIT/TRAIN/DR2/MRFK0/SX176  .WAV 

48,852 

300 

1,882 

2,548 

96 

F; /TIMIT/TRAIN/DR2/MRFK0/SX176  .WAV 

49,108 

300 

1,907 

2,507 

100 

F: /TIMIT/TRAIN/DR2/MRFK0/SX176 .WAV 

49,364 

292 

1,963 

2,499 

99 

F: /TIMIT/TRAIN/DR2/MRFK0/SX176 .WAV 

49,620 

276 

1,972 

2,499 

94 

F : /TIMIT/TRAIN/DR2/MRFK0/SX176  .WAV 

49,876 

276 

2,004 

2,475 

92 

F: /TIMIT/TRAIN/DR2/MRFK0/SX176 .WAV 

50,132 

284 

1,996 

2,450 

87 

F: /TIMIT/TRAIN/DR2/MRGS0/SX276 .WAV 

20,098 

316 

1,955 

2,653 

122 

F: /TIMIT/TRAIN/DR2/MRGS0/SX276 .WAV 

20,354 

316 

2,028 

2,694 

123 

F : /TIMIT/TRAIN/DR2/MRGS0/SX276  .WAV 

20,610 

300 

2,061 

2,694 

123 

F: /TIMIT/TRAIN/DR2/MRGS0/SX276 .WAV 

20,866 

292 

2,037 

2,483 

122 

F: /TIMIT/TRAIN/DR2/MRHL0/SX165 .WAV 

12,625 

446 

1,501 

2,596 

150 

F: /TIMIT/TRAIN/DR2/MRHL0/SX165 .WAV 

12,881 

422 

2,101 

2,604 

150 

F: /TIMIT/TRAIN/DR2/MRJHO/SX307 .WAV 

6,615 

365 

2,345 

2,775 

150 

F: /TIMIT/TRAIN/DR2/MRJH0/SX307 .WAV 

6,871 

316 

2,353 

2,742 

138 

F; /TIMIT/TRAIN/DR2/MRJH0/SX307 .WAV 

7,127 

316 

2,361 

2,751 

128 

F:/TIMIT/TRAIN/DR2/MRJH0/SX307 .WAV 

7,639 

284 

2,377 

2,791 

115 

F: /TIMIT/TRAIN/DR2/MRJM0/SX148  .WAV 

10,407 

284 

2,110 

2,467 

155 

F: /TIMIT/TRAIN/DR2/MRJM0/SX148 .WAV 

10,663 

316 

2,174 

2,613 

147 

F: /TIMIT/TRAIN/DR2/MRJM1/SX128 .WAV 

17,671 

300 

2,045 

2,320 

126 

F: /TIMIT/TRAIN/DR2/MRJM1/SX128 .WAV 

17,927 

276 

2,053 

2,434 

123 

F : /TIMIT/TRAIN/DR2 /MRJM1/SX128 .WAV 

18,183 

276 

1,963 

2,475 

120 

F: /TIMIT/TRAIN/DR2/MRJM1/SX128 .WAV 

18,439 

292 

1,882 

2,515 

114 

F: /TIMIT/TRAIN/DR2/MRJT0/SX148 .WAV 

9,401 

325 

1,785 

2,134 

97 

F: /TIMIT/TRAIN/DR2/MRJT0/SX148  .WAV 

9,657 

300 

1,826 

2,215 

96 

F: /TIMIT/TRAIN/DR2/MRLJ0/SX160 .WAV 

40,543 

'  276 

2,304 

2,783 

92 

F: /TIMIT/TRAIN/DR2/MRLJ0/SX160 .WAV 

40,799 

276 

2,288 

2,783 

92 

F: /TIMIT/TRAIN/DR2/MRLJ0/SX160 .WAV 

41,055 

284 

2,215 

2,742 

92 

F: /TIMIT/TRAIN/DR2/MRLR0/SX206  .WAV 

7,415 

325 

2,012 

2,653 

147 

F: /TIMIT/TRAIN/DR2/MRLR0/SX206 .WAV 

7,671 

414 

1,834 

2,645 

148 

F: /TIMIT/TRAIN/DR2/MRLRO/SX206 .WAV 

7,927 

479 

1,436 

2,629 

150 

F: /TIMIT/TRAIN/DR2/MRMSO/SX120 .WAV 

25,325 

292 

2,150 

2,418 

110 

F: /TIMIT/TRAIN/DR2/MSAT0/SX176 .WAV 

41,655 

308 

1,777 

2,531 

92 

F: /TIMIT/TRAIN/DR2 /MSAT0/SX176 .WAV 

41,911 

316 

1,793 

2,531 

87 

F: /TIMIT/TRAIN/DR2/MSAT0/SX176 .WAV 

42,167 

308 

1,817 

2,548 

86 

F:  /TIMIT/TRAIN/DR2/MSAT0/SX176  .WAV 


42,423 


284 


1,842 


2,523 


85 


F: /TIMIT/TRAIN/DR2/MSAT0/SX176 .WAV 

42,679 

292 

1,850 

2,515 

81 

F : /TIMIT/TRAIN/DR2/MTAT1/SX149 . WAV 

55,655 

430 

1,963 

2,312 

118 

F: /TIMIT/TRAIN/DR2/MTAT1/SX149 .WAV 

55,911 

325 

1,980 

2,613 

118 

F: /TIMIT/TFAIN/DR2/MTAT1/SX149 .WAV 

56,167 

325 

2,004 

2,629 

116 

F: /TIMIT/TRAIN/DR2/MTAT1/SX149 .WAV 

56,423 

333 

1,955 

2,531 

114 

F: /TIMIT/TRAIN/DR2/MTBC0/SX183 .WAV 

15,252 

487 

2,045 

2,661 

110 

F: /TIMIT/TRAIN/DR2/MTBC0/SX183 .WAV 

15,508 

479 

2,126 

2,669 

110 

F: /TIMIT/TRAIN/DR2/MTBC0/SX183 .WAV 

15,764 

479 

2,191 

2,621 

109 

F: /TIMIT/TRAIN/DR2/MTBC0/SX183 .WAV 

16,276 

462 

2,150 

2,499 

107 

F: /TIMIT/TRAIN/DR2/MTBC0/SX183 .WAV 

16,532 

471 

2,053 

2,434 

101 

F: /TIMIT/TRAIN/DR2/MTBC0/SX183 .WAV 

16,788 

495 

1,817 

2,361 

99 

F: /TIMIT/TRAIN/DR2/MTDB0/SX321 .WAV 

44,565 

398 

1,744 

2,442 

100 

F: /TIMIT/TRAIN/DR2/MTDB0/SX321 .WAV 

44,821 

333 

1,874 

2,353 

97 

F : /TIMIT/TRAIN/DR2 /MTDBO /SX32 1 . WAV 

45,077 

373 

1,834 

2,385 

98 

F : /TIMIT/TRAIN/DR2/MTDB0/SX321 .WAV 

45,333 

430 

1,761 

2,385 

98 

F: /TIMIT/TRAIN/DR2/MTJG0/SX170 .WAV 

10,855 

292 

2,061 

2,669 

148 

F : /TIMIT/TRAIN/DR2/MTJG0/SX170 . WAV 

11,111 

260 

2,077 

2,637 

150 

F: /TIMIT/TRAIN/DR2/MTJG0/SX170 .WAV 

11,367 

316 

2,045 

2,645 

148 

F: /TIMIT/TRAIN/DR2/MTJG0/SX170 .WAV 

'  11,623 

325 

1,939 

2,548 

147 

F: /TIMIT/TRAIN/DR2/MWSB0/SX276 .WAV 

14,088 

292 

1,712 

3,108 

94 

F: /TIMIT/TRAIN/DR2/MZMB0/SX176 .WAV 

50,855 

300 

2,061 

2,856 

103 

F : /TIMIT/TRAIN/DR2 /MZMBO /SX17  6 . WAV 

51,111 

292 

2,093 

2,897 

102| 

F:  /TIMIT/TRAIN/DR2/MZMB0/SX176  .WAV 

51,367 

292 

2,101 

2,840 

95 

F: /TIMIT/TRAIN/DR2/MZMB0/SX176 .WAV 

51,623 

292 

2,053 

2,815 

90 

F: /TIMIT/TRAIN/DR2/MZMB0/SX176 .WAV 

51,879 

292 

2,110 

2,832 

88 

F : /TIMIT/TRA1N/DR2/MZMB0/SX176 . WAV 

52,391 

284 

2,077 

2,767 

97 
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